Group startup statements #1977

jesperpedersen · 2020-12-10T09:58:04Z

This pull request wraps the statements executed during startup inside a transaction.

This is important in pool-by-transaction scenarios in order to make sure that all the statements reaches the same connection that is being initialized. Since these are SET statements their changes persists outside the transaction scope, but we do receive the Z????T messages from the server.

This should only be considered for 42.3.

sehrope · 2020-12-10T13:11:42Z

Rather than wrap it in a transaction, how about combining all the statements into a single SQL command separated by semicolons? That'd be one less round trip too.

Though I wonder if any of this is necessary. If you set the ASSUME_MIN_SERVER_VERSION property to anything that's not ancient, then this code won't execute anyway. For servers known to be greater than 9.0 it will send the properties in the startup packet.

jesperpedersen · 2020-12-10T13:21:49Z

Multiple statements with semicolons are multiple transactions.

You can't assume that people will deploy with ASSUME_MIN_SERVER_VERSION, and there are two statements in that >= 9.0 block

sehrope · 2020-12-10T13:32:08Z

Does any pooling software do it's own SQL parsing and splitting of user commands? I figured anything that gets sent with the simple protocol would go as a single simple query message and end up at the actual server in one piece as well.

I've opened a separate issue to discuss bumping the default for ASSUME_MIN_SERVER_VERSION: #1978

jesperpedersen · 2020-12-10T13:42:53Z

Yes, pgpool does SQL parsing, but doesn't support transaction pooling -- pgagroal only looks at the message header when running in transaction mode.

A simple query using semicolons is still multiple transactions.

By bumping ASSUME_MIN_SERVER_VERSION to 9.0 then you remove support for older servers, and for limited amount of code benefit.

sehrope · 2020-12-10T14:26:16Z

I think there's still a race condition with transactional pooling. Say you have:

Client connects JDBC-001 to pool which create PHYSICAL-001
JDBC-001 sends BEGIN, gets assigned to PHYSICAL-001, then does SET / SET / ... / COMMIT which all gets executed on PHYSICAL-001
In parallel, client connects JDBC-002 to pool which created PHYSICAL-002
While PHYSICAL-002 is being connected, JDBC-001 finishes and ends it's transactions (so PHYSICAL-001 is available for assignment).
JDBC-002 sends BEGIN, gets assigned to PHYSICAL-001 (not 002!) and then does then SET / SET / ... / COMMIT on PHYSICAL-001

At this point PHYSICAL-002 could be assigned by the pooler but it won't have any of the initialization code executed. Short of sending it in the startup packet I don't think this completely fixes things. It would just ensure that all or nothing is executed on the same physical connection.

Also, looks like there's at least one other non-transactional SQL command optionally sent (if READ_ONLY is enabled) in connection init: https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java#L236-L238

jesperpedersen · 2020-12-10T14:30:54Z

There are a lot of caveats when using transaction pooling. The above is solved by using prefill of connections - like you normally would - as they don't reuse an existing connection.

bokken · 2020-12-13T16:07:27Z

Postgresql 9.0 was released in 2010. It seems reasonable to drop support for 8.x after 10 years.

jesperpedersen · 2020-12-16T14:19:46Z

@davecramer @vlsi Any thoughts on the >= 9.0 change should be part of this pull request, or a separate one ?

My feeling it should be separate, but I can include it if needed.

davecramer · 2020-12-18T15:06:10Z

Sorry for the late response. I've been relocating for the winter.

The >=9.0 should be a separate PR.

bokken · 2020-12-30T21:01:56Z

@jesperpedersen, #2006 removes support for server versions prior to 9.0. I think we are going to try and get it into the 42.3 release.

davecramer · 2021-03-09T14:16:29Z

@jesperpedersen where are we with this?

jesperpedersen · 2021-03-16T17:56:24Z

@davecramer I think it follows the same assumptions as the other checks in https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java#L822

davecramer · 2021-03-19T19:03:50Z

A simple query using semicolons is still multiple transactions.

Actually a simple query using semicolons is implicitly wrapped with a begin/end

"insert into foo (column1) values (1); rollback; select * from foo;"

will rollback the insert

hannibal218bc · 2022-01-27T20:40:04Z

An issue with this feature in connection with pgbouncer appeared and is being discussed in #2423 .

What I'm wondering about: how does this "wrapping" of startup-parameters help with transaction-pooling? As the transaction completes after startup, it is still quite possible that a subsequent transaction will hit a different backend, which might have been "initialized" with different parameters or not at all.

How is this PR helping with that? I understand that it at least keeps the parameters together, but as far as I understand the case, there are no guarantees that the work will be executed with the desired parameters.

Furthermore, wouldn't it be far more efficient to have the connection pooler initialize each backend (once, at the initial connect/startup) with the desired parameters, instead of having the clients "randomly" fire init commands at some backend?

jesperpedersen · 2022-01-27T21:36:40Z

@hannibal218bc These statements are executed as part of the connection startup - e.g. each JDBC connection.

Each driver (JDBC, C, C#, Rust, ... ) can choose to initialize its startup parameters differently - so each driver needs to be vetted against a proper transaction semantics for each pooling mode.

Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present.

davecramer · 2022-01-27T21:45:57Z

@jesperpedersen so does pgbouncer do anything special with the startup parameters in statement mode to ensure that they are all executed on the same connection?
Seems that is where this should be fixed.

jesperpedersen · 2022-01-27T21:51:17Z

@davecramer Well, then "statement pooling" would need to look at the transactional state of the connection thereby "upgrading" the pooling to a "transaction pool" instead. So, would say no to that - having a false in JDBC is better.

There is a small "window" where "statement pooling" creates a benefit for the overall architecture, but it is limited to auto-commit and SELECT only use-cases.

hannibal218bc · 2022-01-27T21:52:30Z

@jesperpedersen :

Each driver (JDBC, C, C#, Rust, ... ) can choose to initialize its startup parameters differently - so each driver needs to be vetted against a proper transaction semantics for each pooling mode.

Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present.

Exactly, but isn't that another point in favor of @davecramer 's comment that the pool itself would be a better place to set startup parameters, rather than the JDBC driver?
Especially considering the fact(?) that in transaction mode, there is no guarantee that the work transaction will hit a properly initialized backend, as those are two distinct transactions and may be scheduled independently?

hannibal218bc · 2022-01-27T21:55:57Z

Or to put it differently: isn't the use of startup-parameters only feasible in session-level pooling, and not at transaction level?

jesperpedersen · 2022-01-27T22:05:12Z

@hannibal218bc The pool can choose to set startup parameters for all prefilled connections. However, a driver can choose to assume certain settings being active.

Then the pool would need to look at the startup packet to check which prefilled connection "fits" with what driver, and then you run into the trouble of saying "I want 'A' JDBC connections for USER 'B' in DATABASE 'C', and 'D' C# connections for USER 'E' in DATABASE 'F' and so on in the pool configuration.

Ideally, all drivers should not assume anything - and work with the settings returned from the database cluster.

jesperpedersen · 2022-01-27T22:08:05Z

And, transaction pooling will hit the same connection -- the connection is locked for the duration of Z.

vlsi · 2022-01-28T11:25:23Z

Ideally, all drivers should not assume anything - and work with the settings returned from the database cluster.

@jesperpedersen , the driver would need to set session timezone, client_encoding anyway. You can't expect the server to magically know client's timezone which is important for queries that convert timestamps to strings.

On top of that, there are configuration properties that are not reported from the default PostgreSQL backend.
Of course, we would love to get more notifications from the backend (and from the poolers!), however, we have what we have.

For instance, see Proposal to add GUC_REPORT to lc_monetary, lc_numeric and search_path thread on pgsql-hackers.

The bad thing is that authentication failure does not contain message encoding, so the driver has to guess which encoding did server use for composing "authentication failed" error message:

pgjdbc/pgjdbc/src/main/java/org/postgresql/core/PGStream.java

Lines 507 to 533 in 473091a

    
              * Receives a fixed-size string from the backend, and tries to avoid "UTF-8 decode failed" 
        
              * errors. 
        
              * 
        
              * @param len the length of the string to receive, in bytes. 
        
              * @return the decoded string 
        
              * @throws IOException if something wrong happens 
        
              */ 
        
             public EncodingPredictor.DecodeResult receiveErrorString(int len) throws IOException { 
        
               if (!pgInput.ensureBytes(len)) { 
        
                 throw new EOFException(); 
        
               } 
        
               EncodingPredictor.DecodeResult res; 
        
               try { 
        
                 String value = encoding.decode(pgInput.getBuffer(), pgInput.getIndex(), len); 
        
                 // no autodetect warning as the message was converted on its own 
        
                 res = new EncodingPredictor.DecodeResult(value, null); 
        
               } catch (IOException e) { 
        
                 res = EncodingPredictor.decode(pgInput.getBuffer(), pgInput.getIndex(), len); 
        
                 if (res == null) { 
        
                   Encoding enc = Encoding.defaultEncoding(); 
        
                   String value = enc.decode(pgInput.getBuffer(), pgInput.getIndex(), len); 
        
                   res = new EncodingPredictor.DecodeResult(value, enc.name()); 
        
                 } 
        
               } 
        
               pgInput.skip(len); 
        
               return res;

That is why the first thing we want is to configure encoding=utf-8 to avoid users failing into "can't decode response from the server due to utf-8 decoding failed at byte ...".

Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present.

If the connection pooler discards the client's messages, then it sounds like a very unfavourable "feature".

Then the pool would need to look at the startup packet to check which prefilled connection

I would say the connection pool should monitor the queries passing by, and it should track which options are desired for each client. Then the pooler should pretend the client gets a connection with the same features.

The pooler should be able to recreate or re-configure one of its existing connections to the set of features by re-issuing set... requests on behalf of the client (or ignoring them if the pooler is sure the setting is already configured).

The pooler should track prepared statements as well, and it should re-prepare them (or remap the statement name!) on behalf of the clients if the pooler switches to a different physical connection.

jesperpedersen · 2022-01-28T14:21:23Z

@vlsi I agree it will be quite a long time before settings are unified between drivers and controlled from the database cluster.

Having a connection pool parse and perhaps replay messages requires an overhead in the communication so it is def a feature that needs an on/off switch in the configuration. Just having a connection pool in place as a simple proxy adds a performance overhead.

Add BEGIN/COMMIT to startup to group statements

752f223

davecramer merged commit 00774d1 into pgjdbc:master Mar 16, 2021

sync-by-unito bot mentioned this pull request Oct 19, 2021

Bump postgresql from 42.2.23 to 42.3.0 liquibase/liquibase-postgresql#71

Closed

jstuyts mentioned this pull request Oct 27, 2021

PostgreSQL JDBC 42.3.0 results in error because initial statements moved to inside transaction MaterializeInc/materialize#8811

Closed

7 tasks

sync-by-unito bot mentioned this pull request Nov 1, 2021

Bump postgresql from 42.2.23 to 42.3.1 liquibase/liquibase-postgresql#72

Merged

hannibal218bc mentioned this pull request Jan 26, 2022

Grouping of startup statements broke "statement" pooling #2423

Closed

jeremyk-91 mentioned this pull request Mar 15, 2022

Bump version of Postgres to 42.3.3 palantir/atlasdb#5957

Merged

TeslaCN mentioned this pull request Mar 17, 2022

Connect to ShardingSphere-Proxy PostgreSQL by postgresql:42.3.1 caused ERROR: Failed to switch schema, please terminate current transaction apache/shardingsphere#16145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group startup statements #1977

Group startup statements #1977

jesperpedersen commented Dec 10, 2020

sehrope commented Dec 10, 2020

jesperpedersen commented Dec 10, 2020

sehrope commented Dec 10, 2020

jesperpedersen commented Dec 10, 2020

sehrope commented Dec 10, 2020

jesperpedersen commented Dec 10, 2020

bokken commented Dec 13, 2020

jesperpedersen commented Dec 16, 2020

davecramer commented Dec 18, 2020

bokken commented Dec 30, 2020

davecramer commented Mar 9, 2021

jesperpedersen commented Mar 16, 2021

davecramer commented Mar 19, 2021

hannibal218bc commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

davecramer commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

hannibal218bc commented Jan 27, 2022

hannibal218bc commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

vlsi commented Jan 28, 2022

jesperpedersen commented Jan 28, 2022

Group startup statements #1977

Group startup statements #1977

Conversation

jesperpedersen commented Dec 10, 2020

sehrope commented Dec 10, 2020

jesperpedersen commented Dec 10, 2020

sehrope commented Dec 10, 2020

jesperpedersen commented Dec 10, 2020

sehrope commented Dec 10, 2020

jesperpedersen commented Dec 10, 2020

bokken commented Dec 13, 2020

jesperpedersen commented Dec 16, 2020

davecramer commented Dec 18, 2020

bokken commented Dec 30, 2020

davecramer commented Mar 9, 2021

jesperpedersen commented Mar 16, 2021

davecramer commented Mar 19, 2021

hannibal218bc commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

davecramer commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

hannibal218bc commented Jan 27, 2022

hannibal218bc commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

jesperpedersen commented Jan 27, 2022

vlsi commented Jan 28, 2022

jesperpedersen commented Jan 28, 2022