Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix; do not add double quotes to identifiers already double quoted fixes Issue #2223 #2224

Merged
merged 13 commits into from
Oct 10, 2021

Conversation

davecramer
Copy link
Member

No description provided.

@davecramer
Copy link
Member Author

needs tests

try {
sbuf.append('"');
if ( !alreadyQuoted ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I thought it should be more like

if (alreadyQuoted) {
  return;
}

Otherwise you might quote already quoted names because the for loop is still executed. This would result in badly quoted values that will fail executing, e.g. "column" as input will become ""column"" because of the for-loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we need deal with internally quoted names

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well ok, but now valid input will become broken escaped output, e.g. "column" will become ""column"" which always fails. Would make more sense if the for-loop would simply copy the first and last quote if alreadyQuoted is true without quoting it again. That way "column" stays "column" and "column"name" becomes "column""name"

@sehrope
Copy link
Member

sehrope commented Aug 10, 2021

This will break other oddball cases like identifiers that start and end with quotes. As dumb as it is to define a schema via something like CREATE TABLE t("""crazy""" int) (i.e. a column named "crazy"), it's still valid in Postgres because the quotes can be escaped. And PGConnection.escapeIdentifier(...) is part of the public signature so it'd be a public API break as well.

I don't think this method needs to change, it's only job should be to blindly escape the values provided. The fix for the problem referenced in #2223 would be to change any internal call sites for this method to check if the source values parsed from the user SQL are already escaped and if so skip calling this method entirely.

@jnehlmeier
Copy link

@sehrope There is also

/**
* Return the given string suitably quoted to be used as an identifier in an SQL statement string.
* Quotes are added only if necessary (i.e., if the string contains non-identifier characters or
* would be case-folded). Embedded quotes are properly doubled.
*
* @param identifier input identifier
* @return the escaped identifier
* @throws SQLException if something goes wrong
*/
String escapeIdentifier(String identifier) throws SQLException;

which calls that same Utils method, however JavaDoc states quotes are only added if needed. So either the javadoc is wrong or it can not use the Utils method.

I think at some point you need to assume a String is correctly quoted if it starts and ends with quotes and thus do nothing (thus my comment above). If it is not correctly quoted, e.g. "column"name", then some syntax error should be thrown.

@davecramer
Copy link
Member Author

This will break other oddball cases like identifiers that start and end with quotes. As dumb as it is to define a schema via something like CREATE TABLE t("""crazy""" int) (i.e. a column named "crazy"), it's still valid in Postgres because the quotes can be escaped. And PGConnection.escapeIdentifier(...) is part of the public signature so it'd be a public API break as well.

I don't think this method needs to change, it's only job should be to blindly escape the values provided. The fix for the problem referenced in #2223 would be to change any internal call sites for this method to check if the source values parsed from the user SQL are already escaped and if so skip calling this method entirely.

ugh !

@sehrope
Copy link
Member

sehrope commented Aug 10, 2021

which calls that same Utils method, however JavaDoc states quotes are only added if needed. So either the javadoc is wrong or it can not use the Utils method.

That comment refers to adding surrounding quotes if the parameter to be escaped requires any escaping. So foo would not be quoted because Postgres casefolds to lowercase and there's no special characters or white space. But parameters likeFoo orfoo bar would be wrapped in double quotes because the escaped representation mandates double quotes. In all three of those situations the returned value from that method could be appended to dynamically generated SQL as an identifier.

Well mostly ... there's a separate issue that depending on the context there are some reserved words that must always be quoted (https://www.postgresql.org/docs/current/sql-keywords-appendix.html) but the driver has never handled those correctly. For example try creating a table with a column named all:

postgres=# CREATE TABLE t(all int);
ERROR:  syntax error at or near "all"
LINE 1: CREATE TABLE t(all int);

... but that's a separate issue.

I think at some point you need to assume a String is correctly quoted if it starts and ends with quotes and thus do nothing (thus my comment above). If it is not correctly quoted, e.g. "column"name", then some syntax error should be thrown.

It's the caller's responsibility to know whether the value being handled is already escaped. Having the driver guess based on whether the value appears quoted unfortunately leads to broken edge cases because the server allows just about anything other than zero bytes.

@jnehlmeier
Copy link

which calls that same Utils method, however JavaDoc states quotes are only added if needed. So either the javadoc is wrong or it can not use the Utils method.

That comment refers to adding surrounding quotes if the parameter to be escaped requires any escaping. So foo would not be quoted because Postgres casefolds to lowercase and there's no special characters or white space.

Well that's what I meant. Since there are no checks even foo will be escaped to "foo" and already escaped inputs are escaped again. The method simply calls Utils.escapeIdentifier which does no checks at all. So JavaDoc doesn't really match behavior.

Well mostly ... there's a separate issue that depending on the context there are some reserved words that must always be quoted (https://www.postgresql.org/docs/current/sql-keywords-appendix.html) but the driver has never handled those correctly.

Yes that's the case we stumbled upon using a table having a column named user. Thus we activated quoting on ORM level which then caused issue #2223 because quoted inputs (provided through JDBC API) are quoted again.

I think at some point you need to assume a String is correctly quoted if it starts and ends with quotes and thus do nothing (thus my comment above). If it is not correctly quoted, e.g. "column"name", then some syntax error should be thrown.

It's the caller's responsibility to know whether the value being handled is already escaped. Having the driver guess based on whether the value appears quoted unfortunately leads to broken edge cases because the server allows just about anything other than zero bytes.

So pgjdbc needs a helper method that checks wether or not inputs are correctly quoted and then use that helper to fix various locations within pgjdbc.

So it would make sense to add Utils.checkAndEscapeIdentifierIfNeeded which

  • checks if input is correctly escaped.
  • If it is not correctly escaped it assumes the provided value is unescaped and calls Utils.escapeIdentifier which, just like today, unconditionally escapes.

Then this new method can be used in all places that currently use Utils.escapeIdentifier.

@sehrope
Copy link
Member

sehrope commented Aug 10, 2021

The method simply calls Utils.escapeIdentifier which does no checks at all. So JavaDoc doesn't really match behavior.

Okay I see what you're saying here. Yes that does not match the documented behavior. Given that not adding quotes would be a breaking change I think the fix for that would be to update the comment to reflect what it is currently doing.

So pgjdbc needs a helper method that checks wether or not inputs are correctly quoted and then use that helper to fix various locations within pgjdbc.

So it would make sense to add Utils.checkAndEscapeIdentifierIfNeeded which ...

Again, I don't think this is possible. Only the caller would know if the value is already escaped, it can't be inferred from the value because things like "foo" could be considered to be already escaped or could be the raw value expecting a """foo""" return value.

If the value is coming from something like parsing user SQL to infer a RETURNING ... clause, the token for the column should be used as-is. If it doesn't need escaping it wouldn't need escaping in the user's SQL. If it needs to be escaped / quoted, then it would have to already be escaped and quoted or the user's SQL would fail to parse.

Similarly, if the value is coming from a column name we queried from the data dictionary, we know it's the raw value and must be escaped prior to usage.

I don't see a situation where we'd be guessing based on the value. It's always something you have to do or you must not do.

... checks if input is correctly escaped.

That would be a good check to add to code that is parsing user SQL for column tokens (e.g. the RETURNING clause stuff). I'm not sure how that's implemented today but something of that form would be a sensible addition.

@davecramer
Copy link
Member Author

Added tests for MixedCase return. If anyone knows a better way to deal with Parameratized tests I want to ignore please let me know

@jnehlmeier
Copy link

If the value is coming from something like parsing user SQL to infer a RETURNING ... clause, the token for the column should be used as-is. If it doesn't need escaping it wouldn't need escaping in the user's SQL. If it needs to be escaped / quoted, then it would have to already be escaped and quoted or the user's SQL would fail to parse.

@sehrope Well but the RETURNING clause could contain user provided column names (through JDBC Connection API) that are not part of the user query. For an INSERT an ORM will typically ask returning the auto generated primary key column value. So SQL parsing can't really do anything about it. If that column has a crazy name or a reserved keyword as name and the code should not use a rule to decide wether or not it should quote, then the only solution is to trust the input provided through JDBC API. Then it is up to the JDBC user to provide correct column names. In case of ORMs everything would be quoted and PG JDBC would not touch that input at all.

@dfex55
Copy link

dfex55 commented Aug 11, 2021

Again, I don't think this is possible. Only the caller would know if the value is already escaped, it can't be inferred from the value because things like "foo" could be considered to be already escaped or could be the raw value expecting a """foo""" return value.

I think that's the only solution and also agree to jnehlmeier to trust userinput from the connection API.

A specific case in the API for example is:

PgConnection.prepareStatement(String sql, String[] columnNames)

Currently every value in columnNames is escaped.
For an ORM (or any other layer) that can switch quoted identifiers on and off, it is inconsistent to have quoted identifiers in the sql string, but not for the returning columnNames.
And as mentioned in #2223, that other drivers seem to behave as proposed, it might be a reasonable change. Even as it could be called a breaking change for some existing (grown) applications.

@davecramer
Copy link
Member Author

At this point, I think leaving the code the way it is and having users fix their code is where I'm headed. Personally, I would have avoided the use of the column user. Most ORM's allow you to specify a different name if there is a conflict. As for the argument that other drivers seem to do the right thing. Do other databases allow for quoted identifiers ?

@jnehlmeier
Copy link

@davecramer So you are saying every JDBC API user that uses PostgreSQL JDBC implementation must know that Connection.prepareStatement(String sql, String[] columnNames) requires the sql parameter to be properly escaped (if required by using mixed case / reserved words / other weird column names) but the columnNames array must always contain unescaped column names in order to produce a valid query?
That's crazy because both parameters provide schema information. A developer should either care about both parameters or none of them, so basically PG JDBC either escapes nothing or everything internally. Personally I tend to escape everything approach so developers can simply use the column names they see in the database.

IMHO this current inconsistency needs to be fixed and a new major version of pgjdbc released indicating that breaking change.

@davecramer
Copy link
Member Author

@jnehlmeier Well I'm really trying to get a clear understanding of the issue. Initially it was proposed that if an identifier had double quotes, then do nothing. Apparently that doesn't work for all cases. So if we can clarify this problem then we can find a solution.

@jnehlmeier
Copy link

@davecramer In the issue report @amirnajmi discovered a quoting issue while using Datanucleus JDO. We discovered the same issue with Ebean. Both are ORMs. Both use JDBC API Connection.prepareStatement(String sql, String[] columnNames) to execute INSERT queries and both have been configured to quote all column identifiers when calling into the JDBC API.

Because PG JDBC unconditionally quotes columnNames the resulting query is incorrect because both ORMs do not expect that both parameters of prepareStatement(String sql, String[] columnNames) need to be treated differently in order to produce the correct query in postgres.

So ORMs do for example:

prepareStatement("INSERT INTO \"table\" (\"column\") VALUES ('...')", new String[]{ "\"id\"" })

However when doing so Postgres looks for column named "id" instead of id because internally RETURNING """id""" is generated.

So the ultimate root cause is that Connection.prepareStatement(String sql, String[] columnNames) has two parameters which are treated differently with regard to quoting. Thats something ORMs do not expect and personally as a developer I would not expect either. I am not even sure if that is meant to be allowed in JDBC API spec.

@davecramer
Copy link
Member Author

OK, thanks you, the issue is now clear.

So the challenge is that we don't actually look at the sql query or do anything with it. We surround the returning in the event that the user has put mixed case columns in their sql query.

As @sehrope has pointed out postgres allows very unusual names if the user really wants them.

I'm not sure how we please everyone here?

@jnehlmeier
Copy link

We surround the returning in the event that the user has put mixed case columns in their sql query.

Yes but you do that always, regardless of the actual column name. There is no check in code wether or not quoting is required. Thats why the first suggestion to fix the issue was checking the first and last char of the column name to see if it is already quoted and if it is then trust it.

As @sehrope has pointed out postgres allows very unusual names if the user really wants them.

And that's totally fine. PG JDBC now has to decide wether it asks the developer to provide correctly quoted column names when calling into the JDBC API or if it is fine with the raw column name and quotes it internally automatically.

Currently the preparedStatement method asks the developer to provide a correctly quoted SQL string but at the same time asks for the raw column names for building the RETURNING clause.

I'm not sure how we please everyone here?

Make both parameters of the preparedStatement method behave consistently. That is, either quote both internally or quote none of them and let the caller do it. However both solutions will be a breaking change for applications that do use JDBC manually and provide a correctly quoted sql and unquoted column names. To avoid the breaking change a system property could be used to switch to the new, more expected behavior.

Given that the parser likely relies on the fact that the provided SQL is correctly quoted in order to simplify parsing a bit, it is probably easier to require the columnNames parameter to contain correctly quoted names as well. Basically let the JDBC API caller do the quoting. In that case the fix might be as easy as deleting (or surrounding with a system property check)

Utils.escapeIdentifier(nativeSql, columnName);

@davecramer
Copy link
Member Author

We surround the returning in the event that the user has put mixed case columns in their sql query.

Yes but you do that always, regardless of the actual column name. There is no check in code wether or not quoting is required. Thats why the first suggestion to fix the issue was checking the first and last char of the column name to see if it is already quoted and if it is then trust it.

As @sehrope has pointed out postgres allows very unusual names if the user really wants them.

And that's totally fine. PG JDBC now has to decide wether it asks the developer to provide correctly quoted column names when calling into the JDBC API or if it is fine with the raw column name and quotes it internally automatically.

Currently the preparedStatement method asks the developer to provide a correctly quoted SQL string but at the same time asks for the raw column names for building the RETURNING clause.

I'm not sure how we please everyone here?

Make both parameters of the preparedStatement method behave consistently. That is, either quote both internally or quote none of them and let the caller do it. However both solutions will be a breaking change for applications that do use JDBC manually and provide a correctly quoted sql and unquoted column names. To avoid the breaking change a system property could be used to switch to the new, more expected behavior.

Given that the parser likely relies on the fact that the provided SQL is correctly quoted in order to simplify parsing a bit, it is probably easier to require the columnNames parameter to contain correctly quoted names as well. Basically let the JDBC API caller do the quoting. In that case the fix might be as easy as deleting (or surrounding with a system property check)

This would be about the only solution possible since we don't parse the query.

I'm going to have a look to see what prompted us to quote the returning columns.

Dave

@davecramer
Copy link
Member Author

So looking at git blame. This code has been here for around 5 years... That's not an argument, but an observation.

At this point. I think the best thing to do is add a connection parameter which will not quote the returning columns. The default will remain the same.

@jnehlmeier
Copy link

Should be fine to make the behavior configurable via connection parameter / datasource property. That way ORMs can configure the datasource to their needs internally or ask the developer to do so.

… double quotes

around identifiers that are provided in the returning array.

There are some ORMs that now quote all identifiers and if we in turn
quote them this will cause an error.
@sehrope
Copy link
Member

sehrope commented Aug 12, 2021

I took a look at how other drivers handle this. Here's a summary:

  • Oracle - Only accepts unquoted identifiers. Quoted identifiers throw an exception (not found in result). I don't see an option to enable handling quoted identifiers.
  • MariaDB Connector-J (for MySQL) - Completely ignores the parameter and you can put any value / values you want. Blindly returns the first value.
  • Microsoft SQL Server - Completely ignores the parameter and you can put any value / values you want. Blindly returns the first value.
  • DB2 - Accepts both quoted or unquoted identifiers. If quoted then it must exactly match the upper case folded column name. I don't see an option to control the handling as it seems to guess based on seeing a quote in the parameter value.

So it's a toss up for support for quoted identifiers on other platforms.

If we do end up adding a flag to handle this, does it also handle Statement.execute(String sql, String[] columnNames) as well? https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20java.lang.String[])

I think that's the only other place where a user supplied list of columns has the driver generating custom SQL from column names.

@jnehlmeier
Copy link

If we do end up adding a flag to handle this, does it also handle Statement.execute(String sql, String[] columnNames) as well? https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20java.lang.String[])

I think that's the only other place where a user supplied list of columns has the driver generating custom SQL from column names.

Looks like it should always end up in CachedQueryCreateAction.create() if the statement isn't cached, which then calls into the modified parser. So Statement.execute / executeUpdate should both be covered.

@davecramer
Copy link
Member Author

I took a look at how other drivers handle this. Here's a summary:

  • Oracle - Only accepts unquoted identifiers. Quoted identifiers throw an exception (not found in result). I don't see an option to enable handling quoted identifiers.
  • MariaDB Connector-J (for MySQL) - Completely ignores the parameter and you can put any value / values you want. Blindly returns the first value.
  • Microsoft SQL Server - Completely ignores the parameter and you can put any value / values you want. Blindly returns the first value.
  • DB2 - Accepts both quoted or unquoted identifiers. If quoted then it must exactly match the upper case folded column name. I don't see an option to control the handling as it seems to guess based on seeing a quote in the parameter value.

So it's a toss up for support for quoted identifiers on other platforms.

If we do end up adding a flag to handle this, does it also handle Statement.execute(String sql, String[] columnNames) as well? https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#execute(java.lang.String,%20java.lang.String[])

I think that's the only other place where a user supplied list of columns has the driver generating custom SQL from column names.

So if nobody else does this why are we ?

@amirnajmi
Copy link

amirnajmi commented Aug 12, 2021

I took a look at how other drivers handle this. Here's a summary:

  • Oracle - Only accepts unquoted identifiers. Quoted identifiers throw an exception (not found in result). I don't see an option to enable handling quoted identifiers.

@sehrope Are you sure about Oracle?

Because the main reason that I opened issue #2223 was that we did not have any problem on Oracle and when we changed the driver to Postgersql, this problem appeared.

@sehrope
Copy link
Member

sehrope commented Aug 12, 2021

@amirnajmi Yes. Here's my basic test case for Oracle:

  private static void testGeneratedKeys(Connection conn, String sql, String[] columnNames) throws SQLException {
    try (PreparedStatement stmt = conn.prepareStatement(sql, columnNames)) {
      stmt.execute();
      ResultSet rs = stmt.getGeneratedKeys();
      rs.next();
      System.out.println("Success for columnNames: " + StringUtils.join(columnNames, ','));
      for(int i =1;i<=columnNames.length;i++) {
        System.out.println(" KEY[" + i + "] = " + rs.getInt(i));  
      }
    } catch (SQLException e) {
      System.out.println("Error for columnNames: " + StringUtils.join(columnNames, ','));
      e.printStackTrace();
    }
  }

  @Test
  public void testOracle() throws Exception {
    String url = "jdbc:oracle:thin:@localhost:32770:xe";
    String user = "sys as sysdba";
    String password = "dbpass";

    try (Connection conn = DriverManager.getConnection(url, user, password)) {
      executeSqlIgnoreErrors(conn, "DROP TABLE foo");
      executeSql(conn, "CREATE TABLE foo (id NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY, x NUMBER)");

      String insertSql = "INSERT INTO foo (x) VALUES (1)";
      testGeneratedKeys(conn, insertSql, new String[] {"ID"});
      testGeneratedKeys(conn, insertSql, new String[] {"ID", "ID"});
      testGeneratedKeys(conn, insertSql, new String[] {"id"});
      testGeneratedKeys(conn, insertSql, new String[] {"id", "ID"});
      testGeneratedKeys(conn, insertSql, new String[] {"id", "ID", "iD"});
      testGeneratedKeys(conn, insertSql, new String[] {"\"ID\""});
      testGeneratedKeys(conn, insertSql, new String[] {"bad"});
    }
  }

And here's the output:

Success for columnNames: ID
 KEY[1] = 1
Success for columnNames: ID,ID
 KEY[1] = 2
 KEY[2] = 2
Success for columnNames: id
 KEY[1] = 3
Success for columnNames: id,ID
 KEY[1] = 4
 KEY[2] = 4
Success for columnNames: id,ID,iD
 KEY[1] = 5
 KEY[2] = 5
 KEY[3] = 5
Error for columnNames: "ID"
java.sql.SQLException: Invalid argument(s) in call
	at oracle.jdbc.driver.AutoKeyInfo.getNewSql(AutoKeyInfo.java:187)
	at oracle.jdbc.driver.PhysicalConnection.prepareStatement(PhysicalConnection.java:4342)
	at scratch.JdbcStuff.testGeneratedKeys(JdbcStuff.java:30)
	at scratch.JdbcStuff.testOracle(JdbcStuff.java:60)
Error for columnNames: bad
java.sql.SQLException: Invalid argument(s) in call
	at oracle.jdbc.driver.AutoKeyInfo.getNewSql(AutoKeyInfo.java:187)
	at oracle.jdbc.driver.PhysicalConnection.prepareStatement(PhysicalConnection.java:4342)
	at scratch.JdbcStuff.testGeneratedKeys(JdbcStuff.java:30)
	at scratch.JdbcStuff.testOracle(JdbcStuff.java:61)

And mixed case combination that gets case folded is fine. Invalid or already quoted column names are rejected.

@amirnajmi
Copy link

amirnajmi commented Aug 12, 2021

@sehrope
I checked Oracle driver, exactly with the same sample as you posted, the only difference is that I used parameterized query, and quoted all table and columns name:
"INSERT INTO \"foo\" (\"x\") VALUES (?)"

This is part of my code that I called conn.preparedStatement() (I used number instead of x for identity field):
ArrayList<String> pkColumnNames = new ArrayList<>();
pkColumnNames.add("\"number\"");
PreparedStatement ps = con.prepareStatement(QUERY, pkColumnNames.toArray(new String[0]));

And the part for getting result:
rs = ps.getGeneratedKeys();
if (rs != null && rs.next()) {
datastoreId = rs.getObject(1);
System.out.println(pkColumnNames + ": " + datastoreId);
}

Output:
["number"]: 44

@amirnajmi
Copy link

So if nobody else does this why are we?

@davecramer As I mentioned in my last message, Oracle is working with double-quotes and other drivers are ignoring or handling that either.
So I think we should do something, handle or ignore it.
would you please share your idea?

@amirnajmi
Copy link

@davecramer, When this PR getting merge? is there anything else to check?

@davecramer
Copy link
Member Author

Hmmm ya, I need to get back to this. There are changes in this PR that don't belong here.

@TakahikoKawasaki
Copy link

@davecramer It would be greatly appreciated if you kindly put a bit higher priority on this PR. This is becoming a blocking issue for some of our customers and they may start considering switching their database from PostgreSQL to another.

@davecramer davecramer merged commit 50d6aba into pgjdbc:master Oct 10, 2021
@davecramer davecramer deleted the fixdoublequote branch October 10, 2021 13:45
@TakahikoKawasaki
Copy link

@davecramer Really appreciated! Thank you very much!

@davecramer
Copy link
Member Author

@TakahikoKawasaki
Copy link

@davecramer Sure. We'll test it.

@davecramer
Copy link
Member Author

@TakahikoKawasaki have you tested it ?

@amirnajmi
Copy link

@davecramer We have tested the jar file(snapshot version) you sent, and the problem that we had with Datanucleus JDO was resolved; by adding the new flag(quoteReturningIdentifiers) to avoid adding additional escaped identifiers in returning columns.
So the flag is working fine, at least in our scenarios.

@davecramer
Copy link
Member Author

Thank you. I want to release as soon as possible

@amirnajmi
Copy link

Sounds great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants