Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid sql generated with char column type #85

Open
thehesiod opened this issue Jul 29, 2021 · 21 comments
Open

invalid sql generated with char column type #85

thehesiod opened this issue Jul 29, 2021 · 21 comments

Comments

@thehesiod
Copy link

thehesiod commented Jul 29, 2021

If you create an AST with a column like:

ColumnDef(
    'column_name', 
    inhcount=0,
    is_from_type=False,
    is_local=True,
    is_not_null=False, 
    typeName=TypeName(
        names=(String('char'),), 
        pct_type=False, 
        setof=False, 
        typemod=-1, 
        typmods=(A_Const(10),)
    )
)

it will serialize to "char"(20) which is invalid sql. It should just be char(20)

@lelit
Copy link
Owner

lelit commented Jul 29, 2021

I think char is special:

$ pgpp -tS "create table foo (bar char(20))"
[{'@': 'RawStmt',
  'stmt': {'@': 'CreateStmt',
           'if_not_exists': False,
           'oncommit': {'#': 'OnCommitAction',
                        'name': 'ONCOMMIT_NOOP',
                        'value': 0},
           'relation': {'@': 'RangeVar',
                        'inh': True,
                        'location': 13,
                        'relname': 'foo',
                        'relpersistence': 'p'},
           'tableElts': ({'@': 'ColumnDef',
                          'colname': 'bar',
                          'generated': '\x00',
                          'identity': '\x00',
                          'inhcount': 0,
                          'is_from_type': False,
                          'is_local': True,
                          'is_not_null': False,
                          'location': 18,
                          'storage': '\x00',
                          'typeName': {'@': 'TypeName',
                                       'location': 22,
                                       'names': ({'@': 'String',
                                                  'val': 'pg_catalog'},
                                                 {'@': 'String',
                                                  'val': 'bpchar'}),
                                       'pct_type': False,
                                       'setof': False,
                                       'typemod': -1,
                                       'typmods': ({'@': 'A_Const',
                                                    'location': 27,
                                                    'val': {'@': 'Integer',
                                                            'val': 20}},)}},)},
  'stmt_len': 0,
  'stmt_location': 0}]

and indeed:

$ pgpp -S "create table foo (bar char(20))"
CREATE TABLE foo (
  bar char(20)
)

@thehesiod
Copy link
Author

ya it's special, I'm guessing it's an internal alias of some sort. Here's what happens via psql:

localhost user@db=# create temp table foo(bar char(20));
CREATE TABLE
localhost user@db=# \d foo
                  Table "pg_temp_2.foo"
 Column |     Type      | Collation | Nullable | Default 
--------+---------------+-----------+----------+---------
 bar    | character(20) |           |          | 

@thehesiod
Copy link
Author

ya check here: https://www.postgresql.org/docs/12/datatype.html. there's a list of aliases, and it appears there are at least two levels:
char -> character -> bpchar

@thehesiod
Copy link
Author

so i guess the question is when constructing statements from AST, do you have to use the low-level types (in which case it should probably assert for invalid types), or can you use the high level types to preserve information. Otherwise you'd have information loss because you could never preserve the fact that the column is char/character/etc.

@lelit
Copy link
Owner

lelit commented Jul 29, 2021

We cannot raise an exception, because we do not have access to a catalog, so we cannot determine whether the type is valid or not, it could really be a custom type, for example introduced by an extension.

@thehesiod
Copy link
Author

perhaps then it needs to natively recognize pg aliases?

@lelit
Copy link
Owner

lelit commented Jul 30, 2021

Maybe, yes: that bpchar speciality could be generalized. What about starting to collect such aliases?

@thehesiod
Copy link
Author

thehesiod commented Aug 2, 2021

is this what you're looking for?

localhost user@db=# \dTS+
                                                                                                                                             List of data types
   Schema   |             Name              |         Internal name         | Size  | Elements |         Owner         | Access privileges |                                                                          Description                                                                           
------------+-------------------------------+-------------------------------+-------+----------+-----------------------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------
 pg_catalog | aclitem                       | aclitem                       | 12    |          | postgres              |                   | access control list
 pg_catalog | "any"                         | any                           | 4     |          | postgres              |                   | pseudo-type representing any type
 pg_catalog | anyarray                      | anyarray                      | var   |          | postgres              |                   | pseudo-type representing a polymorphic array type
 pg_catalog | anyelement                    | anyelement                    | 4     |          | postgres              |                   | pseudo-type representing a polymorphic base type
 pg_catalog | anyenum                       | anyenum                       | 4     |          | postgres              |                   | pseudo-type representing a polymorphic base type that is an enum
 pg_catalog | anynonarray                   | anynonarray                   | 4     |          | postgres              |                   | pseudo-type representing a polymorphic base type that is not an array
 pg_catalog | anyrange                      | anyrange                      | var   |          | postgres              |                   | pseudo-type representing a polymorphic base type that is a range
 pg_catalog | bigint                        | int8                          | 8     |          | postgres              |                   | ~18 digit integer, 8-byte storage
 pg_catalog | bit                           | bit                           | var   |          | postgres              |                   | fixed-length bit string
 pg_catalog | bit varying                   | varbit                        | var   |          | postgres              |                   | variable-length bit string
 pg_catalog | boolean                       | bool                          | 1     |          | postgres              |                   | boolean, 'true'/'false'
 pg_catalog | box                           | box                           | 32    |          | postgres              |                   | geometric box '(lower left,upper right)'
 pg_catalog | bytea                         | bytea                         | var   |          | postgres              |                   | variable-length string, binary values escaped
 pg_catalog | "char"                        | char                          | 1     |          | postgres              |                   | single character
 pg_catalog | character                     | bpchar                        | var   |          | postgres              |                   | char(length), blank-padded string, fixed storage length
 pg_catalog | character varying             | varchar                       | var   |          | postgres              |                   | varchar(length), non-blank-padded string, variable storage length
 pg_catalog | cid                           | cid                           | 4     |          | postgres              |                   | command identifier type, sequence in transaction id
 pg_catalog | cidr                          | cidr                          | var   |          | postgres              |                   | network IP address/netmask, network address
 pg_catalog | circle                        | circle                        | 24    |          | postgres              |                   | geometric circle '(center,radius)'
 pg_catalog | cstring                       | cstring                       | var   |          | postgres              |                   | C-style string
 pg_catalog | date                          | date                          | 4     |          | postgres              |                   | date
 pg_catalog | daterange                     | daterange                     | var   |          | postgres              |                   | range of dates
 pg_catalog | double precision              | float8                        | 8     |          | postgres              |                   | double-precision floating point number, 8-byte storage
 pg_catalog | event_trigger                 | event_trigger                 | 4     |          | postgres              |                   | pseudo-type for the result of an event trigger function
 pg_catalog | fdw_handler                   | fdw_handler                   | 4     |          | postgres              |                   | pseudo-type for the result of an FDW handler function
 pg_catalog | gtsvector                     | gtsvector                     | var   |          | postgres              |                   | GiST index internal text representation for text search
 pg_catalog | index_am_handler              | index_am_handler              | 4     |          | postgres              |                   | pseudo-type for the result of an index AM handler function
 pg_catalog | inet                          | inet                          | var   |          | postgres              |                   | IP address/netmask, host address, netmask optional
 pg_catalog | int2vector                    | int2vector                    | var   |          | postgres              |                   | array of int2, used in system tables
 pg_catalog | int4range                     | int4range                     | var   |          | postgres              |                   | range of integers
 pg_catalog | int8range                     | int8range                     | var   |          | postgres              |                   | range of bigints
 pg_catalog | integer                       | int4                          | 4     |          | postgres              |                   | -2 billion to 2 billion integer, 4-byte storage
 pg_catalog | internal                      | internal                      | 8     |          | postgres              |                   | pseudo-type representing an internal data structure
 pg_catalog | interval                      | interval                      | 16    |          | postgres              |                   | @ <number> <units>, time interval
 pg_catalog | json                          | json                          | var   |          | postgres              |                   | JSON stored as text
 pg_catalog | jsonb                         | jsonb                         | var   |          | postgres              |                   | Binary JSON
 pg_catalog | jsonpath                      | jsonpath                      | var   |          | postgres              |                   | JSON path
 pg_catalog | language_handler              | language_handler              | 4     |          | postgres              |                   | pseudo-type for the result of a language handler function
 pg_catalog | line                          | line                          | 24    |          | postgres              |                   | geometric line
 pg_catalog | lseg                          | lseg                          | 32    |          | postgres              |                   | geometric line segment '(pt1,pt2)'
 pg_catalog | macaddr                       | macaddr                       | 6     |          | postgres              |                   | XX:XX:XX:XX:XX:XX, MAC address
 pg_catalog | macaddr8                      | macaddr8                      | 8     |          | postgres              |                   | XX:XX:XX:XX:XX:XX:XX:XX, MAC address
 pg_catalog | money                         | money                         | 8     |          | postgres              |                   | monetary amounts, $d,ddd.cc
 pg_catalog | name                          | name                          | 64    |          | postgres              |                   | 63-byte type for storing system identifiers
 pg_catalog | numeric                       | numeric                       | var   |          | postgres              |                   | numeric(precision, decimal), arbitrary precision number
 pg_catalog | numrange                      | numrange                      | var   |          | postgres              |                   | range of numerics
 pg_catalog | oid                           | oid                           | 4     |          | postgres              |                   | object identifier(oid), maximum 4 billion
 pg_catalog | oidvector                     | oidvector                     | var   |          | postgres              |                   | array of oids, used in system tables
 pg_catalog | opaque                        | opaque                        | 4     |          | postgres              |                   | obsolete, deprecated pseudo-type
 pg_catalog | path                          | path                          | var   |          | postgres              |                   | geometric path '(pt1,...)'
 pg_catalog | pg_ddl_command                | pg_ddl_command                | 8     |          | postgres              |                   | internal type for passing CollectedCommand
 pg_catalog | pg_dependencies               | pg_dependencies               | var   |          | postgres              |                   | multivariate dependencies
 pg_catalog | pg_lsn                        | pg_lsn                        | 8     |          | postgres              |                   | PostgreSQL LSN datatype
 pg_catalog | pg_mcv_list                   | pg_mcv_list                   | var   |          | postgres              |                   | multivariate MCV list
 pg_catalog | pg_ndistinct                  | pg_ndistinct                  | var   |          | postgres              |                   | multivariate ndistinct coefficients
 pg_catalog | pg_node_tree                  | pg_node_tree                  | var   |          | postgres              |                   | string representing an internal node tree
 pg_catalog | point                         | point                         | 16    |          | postgres              |                   | geometric point '(x, y)'
 pg_catalog | polygon                       | polygon                       | var   |          | postgres              |                   | geometric polygon '(pt1,...)'
 pg_catalog | real                          | float4                        | 4     |          | postgres              |                   | single-precision floating point number, 4-byte storage
 pg_catalog | record                        | record                        | var   |          | postgres              |                   | pseudo-type representing any composite type
 pg_catalog | refcursor                     | refcursor                     | var   |          | postgres              |                   | reference to cursor (portal name)
 pg_catalog | regclass                      | regclass                      | 4     |          | postgres              |                   | registered class
 pg_catalog | regconfig                     | regconfig                     | 4     |          | postgres              |                   | registered text search configuration
 pg_catalog | regdictionary                 | regdictionary                 | 4     |          | postgres              |                   | registered text search dictionary
 pg_catalog | regnamespace                  | regnamespace                  | 4     |          | postgres              |                   | registered namespace
 pg_catalog | regoper                       | regoper                       | 4     |          | postgres              |                   | registered operator
 pg_catalog | regoperator                   | regoperator                   | 4     |          | postgres              |                   | registered operator (with args)
 pg_catalog | regproc                       | regproc                       | 4     |          | postgres              |                   | registered procedure
 pg_catalog | regprocedure                  | regprocedure                  | 4     |          | postgres              |                   | registered procedure (with args)
 pg_catalog | regrole                       | regrole                       | 4     |          | postgres              |                   | registered role
 pg_catalog | regtype                       | regtype                       | 4     |          | postgres              |                   | registered type
 pg_catalog | smallint                      | int2                          | 2     |          | postgres              |                   | -32 thousand to 32 thousand, 2-byte storage
 pg_catalog | table_am_handler              | table_am_handler              | 4     |          | postgres              |                   | 
 pg_catalog | text                          | text                          | var   |          | postgres              |                   | variable-length string, no limit specified
 pg_catalog | tid                           | tid                           | 6     |          | postgres              |                   | (block, offset), physical location of tuple
 pg_catalog | timestamp without time zone   | timestamp                     | 8     |          | postgres              |                   | date and time
 pg_catalog | timestamp with time zone      | timestamptz                   | 8     |          | postgres              |                   | date and time with time zone
 pg_catalog | time without time zone        | time                          | 8     |          | postgres              |                   | time of day
 pg_catalog | time with time zone           | timetz                        | 12    |          | postgres              |                   | time of day with time zone
 pg_catalog | trigger                       | trigger                       | 4     |          | postgres              |                   | pseudo-type for the result of a trigger function
 pg_catalog | tsm_handler                   | tsm_handler                   | 4     |          | postgres              |                   | pseudo-type for the result of a tablesample method function
 pg_catalog | tsquery                       | tsquery                       | var   |          | postgres              |                   | query representation for text search
 pg_catalog | tsrange                       | tsrange                       | var   |          | postgres              |                   | range of timestamps without time zone
 pg_catalog | tstzrange                     | tstzrange                     | var   |          | postgres              |                   | range of timestamps with time zone
 pg_catalog | tsvector                      | tsvector                      | var   |          | postgres              |                   | text representation for text search
 pg_catalog | txid_snapshot                 | txid_snapshot                 | var   |          | postgres              |                   | txid snapshot
 pg_catalog | unknown                       | unknown                       | var   |          | postgres              |                   | pseudo-type representing an undetermined type
 pg_catalog | uuid                          | uuid                          | 16    |          | postgres              |                   | UUID datatype
 pg_catalog | void                          | void                          | 4     |          | postgres              |                   | pseudo-type for the result of a function with no real result
 pg_catalog | xid                           | xid                           | 4     |          | postgres              |                   | transaction id
 pg_catalog | xml                           | xml                           | var   |          | postgres              |                   | XML content

this is from PG12. What's interesting is the quotes around char in the table above. Notice the following behavior:

localhost user@db=# create temp table foo(a "char");
CREATE TABLE
localhost user@db=# drop table foo;
DROP TABLE
localhost user@db=# create temp table foo(a "char(20)");
ERROR:  42704: type "char(20)" does not exist
LINE 1: create temp table foo(a "char(20)");
                                ^
LOCATION:  typenameType, parse_type.c:275
localhost user@db=# create temp table foo(a "char" (20));
ERROR:  42601: type modifier is not allowed for type "char"
LINE 1: create temp table foo(a "char" (20));
                                ^
localhost user@db=# create temp table foo(a char(20));
CREATE TABLE

so it seems quoted char is just for 1 char, but if you specify a length then it needs to be unquoted. So somewhat of a special case.

@lelit
Copy link
Owner

lelit commented Aug 3, 2021

I see, and some heuristic is already in place:

$ pgpp -S "create table foo (bar char(20))"
CREATE TABLE foo (
  bar char(20)
)
$ pgpp -S "create table foo (bar "char"(20))"
CREATE TABLE foo (
  bar char(20)
)

Going back to the original case however, the first problem is that you didn't specify the fully qualified name of the type, so in any case the machinery won't work.

@thehesiod
Copy link
Author

my point is IMO one should be able to construct an AST from aliases since you may not have a way to do that a priori. In my case I'm translating TSQL types to PSQL programmatically via AST. Otherwise I'd have to hard-code these mappings (which I have to for now), much like this library is already doing.

I'm not quite sure I understand what you're suggesting though, are you suggesting using pgcatalog.character? That won't work either because that outputs: CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog.character(10)) ON COMMIT DROP which is also invalid

@lelit
Copy link
Owner

lelit commented Aug 3, 2021

I meant that your

typeName=TypeName(
        names=(String('char'),), 
        pct_type=False, 
        setof=False, 
        typemod=-1, 
        typmods=(A_Const(10),)
    )

should actually be

typeName=TypeName(
        names=(String('pg_catalog'), String('char'),), 
        pct_type=False, 
        setof=False, 
        typemod=-1, 
        typmods=(A_Const(10),)
    )

Does that work?

@thehesiod
Copy link
Author

nope

>>> import pglast
>>> from pglast.stream import RawStream
>>> from pglast import ast as pg_ast
>>> from pglast import enums as pg_enums
>>> 
>>> column_defs=(pg_ast.ColumnDef(
...     'column_name', 
...     inhcount=0,
...     is_from_type=False,
...     is_local=True,
...     is_not_null=False, 
...     typeName=pg_ast.TypeName(
...         names=(pg_ast.String('pg_catalog'), pg_ast.String('char'),), 
...         pct_type=False, 
...         setof=False, 
...         typemod=-1, 
...         typmods=(pg_ast.A_Const(10),)
...     )
... ),)
>>> 
>>> tbl_stmt = pg_ast.CreateStmt(
...     if_not_exists=True, oncommit=pg_enums.OnCommitAction.ONCOMMIT_DROP,
...     relation=pg_ast.RangeVar(inh=True, relname='foo', relpersistence='t'), tableElts=column_defs
... )
>>> RawStream()(tbl_stmt)
'CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog.char(10)) ON COMMIT DROP'
>>> 
$ psql -h localhost -U user -p port db
psql (13.3, server 12.7 (Debian 12.7-1.pgdg100+1))
localhost iam_commodities_admin@commodities=# CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog.char(10)) ON COMMIT DROP;
ERROR:  42601: type modifier is not allowed for type "pg_catalog.char"
LINE 1: ...TE TEMPORARY TABLE IF NOT EXISTS foo (column_name pg_catalog...
                                                             ^
LOCATION:  typenameTypeMod, parse_type.c:366

@lelit
Copy link
Owner

lelit commented Aug 3, 2021

I see. So, pglast could have a better knowledge about aliases, as you suggested. OTOH, names=(pg_ast.String('pg_catalog'), pg_ast.String('char'),) works now.

@thehesiod
Copy link
Author

sorry not following, per above it doesn't work now

@lelit
Copy link
Owner

lelit commented Aug 4, 2021

Strange, this works for me:

import subprocess

import pglast
from pglast.stream import RawStream
from pglast import ast as pg_ast
from pglast import enums as pg_enums

column_defs=(pg_ast.ColumnDef(
    'column_name',
    inhcount=0,
    is_from_type=False,
    is_local=True,
    is_not_null=False,
    typeName=pg_ast.TypeName(
        names=(pg_ast.String('pg_catalog'), pg_ast.String('bpchar'),),
        pct_type=False,
        setof=False,
        typemod=-1,
        typmods=(pg_ast.A_Const(10),)
    )
),)

tbl_stmt = pg_ast.CreateStmt(
    if_not_exists=True, oncommit=pg_enums.OnCommitAction.ONCOMMIT_DROP,
    relation=pg_ast.RangeVar(inh=True, relname='foo', relpersistence='t'), tableElts=column_defs
)

sql = RawStream()(tbl_stmt)

subprocess.run(['psql', '-e', '-c', sql, 'foo'])

and emits

CREATE TEMPORARY TABLE IF NOT EXISTS foo (column_name char(10)) ON COMMIT DROP
CREATE TABLE

@thehesiod
Copy link
Author

remember, I'm trying to do via alias, names=(pg_ast.String('pg_catalog'), pg_ast.String('char'),),

@lelit
Copy link
Owner

lelit commented Aug 4, 2021

Yes, I remember :-)
I already said above that pglast could have a better knowledge about aliases. But there's little difference for you, right now, to manage such aliases before creating the pglast.ast tree (ie, as you go along migrating the TSQL structure), or having such aliases taught in some way to pglast TypeName.
As time permits, I will try to figure out if there is an easy way to handle aliases, but as of now I cannot see an easy path to do exactly what you are seeking in a generic way. Of course, ideas are welcome!

@lelit
Copy link
Owner

lelit commented Aug 4, 2021

If I were you, I'd probably build a mapping between TSQL types and pglast.ast.TypeName.

@thehesiod
Copy link
Author

ya I have one :) let me know if you want to keep this issue open then or add to some kind of backlog of yours.

my idea to do this programmatically was to build a string with the type create table foo (column_name char(10)) and pass this to parse_sql, and see what pglast spits out for the type mapping. This obviously is bad as it allows for string escaping attacks and what not.

so it seems pglast does have this information, perhaps this could be solved by exposing a helper method to do this mapping (which seems like pglast already has).

@lelit
Copy link
Owner

lelit commented Aug 4, 2021

Can you clarify what would you do with the result of that parse? I do not understand how that can be useful for your cause...

@thehesiod
Copy link
Author

so in my case i'd get back that char maps to (pg_ast.String('pg_catalog'), pg_ast.String('bpchar'),), and use that as the mapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants