-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group startup statements #1977
Group startup statements #1977
Conversation
Rather than wrap it in a transaction, how about combining all the statements into a single SQL command separated by semicolons? That'd be one less round trip too. Though I wonder if any of this is necessary. If you set the ASSUME_MIN_SERVER_VERSION property to anything that's not ancient, then this code won't execute anyway. For servers known to be greater than 9.0 it will send the properties in the startup packet. |
Multiple statements with semicolons are multiple transactions. You can't assume that people will deploy with |
Does any pooling software do it's own SQL parsing and splitting of user commands? I figured anything that gets sent with the simple protocol would go as a single simple query message and end up at the actual server in one piece as well. I've opened a separate issue to discuss bumping the default for ASSUME_MIN_SERVER_VERSION: #1978 |
Yes, pgpool does SQL parsing, but doesn't support transaction pooling -- pgagroal only looks at the message header when running in transaction mode. A simple query using semicolons is still multiple transactions. By bumping |
I think there's still a race condition with transactional pooling. Say you have:
At this point PHYSICAL-002 could be assigned by the pooler but it won't have any of the initialization code executed. Short of sending it in the startup packet I don't think this completely fixes things. It would just ensure that all or nothing is executed on the same physical connection. Also, looks like there's at least one other non-transactional SQL command optionally sent (if READ_ONLY is enabled) in connection init: https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java#L236-L238 |
There are a lot of caveats when using transaction pooling. The above is solved by using prefill of connections - like you normally would - as they don't reuse an existing connection. |
Postgresql 9.0 was released in 2010. It seems reasonable to drop support for 8.x after 10 years. |
@davecramer @vlsi Any thoughts on the My feeling it should be separate, but I can include it if needed. |
Sorry for the late response. I've been relocating for the winter. The >=9.0 should be a separate PR. |
@jesperpedersen, #2006 removes support for server versions prior to 9.0. I think we are going to try and get it into the 42.3 release. |
@jesperpedersen where are we with this? |
@davecramer I think it follows the same assumptions as the other checks in https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/core/v3/ConnectionFactoryImpl.java#L822 |
Actually a simple query using semicolons is implicitly wrapped with a begin/end "insert into foo (column1) values (1); rollback; select * from foo;" will rollback the insert |
An issue with this feature in connection with pgbouncer appeared and is being discussed in #2423 . What I'm wondering about: how does this "wrapping" of startup-parameters help with transaction-pooling? As the transaction completes after startup, it is still quite possible that a subsequent transaction will hit a different backend, which might have been "initialized" with different parameters or not at all. How is this PR helping with that? I understand that it at least keeps the parameters together, but as far as I understand the case, there are no guarantees that the work will be executed with the desired parameters. Furthermore, wouldn't it be far more efficient to have the connection pooler initialize each backend (once, at the initial connect/startup) with the desired parameters, instead of having the clients "randomly" fire init commands at some backend? |
@hannibal218bc These statements are executed as part of the connection startup - e.g. each JDBC connection. Each driver (JDBC, C, C#, Rust, ... ) can choose to initialize its startup parameters differently - so each driver needs to be vetted against a proper transaction semantics for each pooling mode. Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present. |
@jesperpedersen so does pgbouncer do anything special with the startup parameters in statement mode to ensure that they are all executed on the same connection? |
@davecramer Well, then "statement pooling" would need to look at the transactional state of the connection thereby "upgrading" the pooling to a "transaction pool" instead. So, would say no to that - having a There is a small "window" where "statement pooling" creates a benefit for the overall architecture, but it is limited to auto-commit and SELECT only use-cases. |
Exactly, but isn't that another point in favor of @davecramer 's comment that the pool itself would be a better place to set startup parameters, rather than the JDBC driver? |
Or to put it differently: isn't the use of startup-parameters only feasible in session-level pooling, and not at transaction level? |
@hannibal218bc The pool can choose to set startup parameters for all prefilled connections. However, a driver can choose to assume certain settings being active. Then the pool would need to look at the startup packet to check which prefilled connection "fits" with what driver, and then you run into the trouble of saying "I want 'A' JDBC connections for USER 'B' in DATABASE 'C', and 'D' C# connections for USER 'E' in DATABASE 'F' and so on in the pool configuration. Ideally, all drivers should not assume anything - and work with the settings returned from the database cluster. |
And, transaction pooling will hit the same connection -- the connection is locked for the duration of |
@jesperpedersen , the driver would need to set On top of that, there are configuration properties that are not reported from the default PostgreSQL backend. For instance, see The bad thing is that authentication failure does not contain message encoding, so the driver has to guess which encoding did server use for composing "authentication failed" error message: pgjdbc/pgjdbc/src/main/java/org/postgresql/core/PGStream.java Lines 507 to 533 in 473091a
That is why the first thing we want is to configure
If the connection pooler discards the client's messages, then it sounds like a very unfavourable "feature".
I would say the connection pool should monitor the queries passing by, and it should track which options are desired for each client. Then the pooler should pretend the client gets a connection with the same features. The pooler should be able to recreate or re-configure one of its existing connections to the set of features by re-issuing set... requests on behalf of the client (or ignoring them if the pooler is sure the setting is already configured). The pooler should track prepared statements as well, and it should re-prepare them (or remap the statement name!) on behalf of the clients if the pooler switches to a different physical connection. |
@vlsi I agree it will be quite a long time before settings are unified between drivers and controlled from the database cluster. Having a connection pool parse and perhaps replay messages requires an overhead in the communication so it is def a feature that needs an on/off switch in the configuration. Just having a connection pool in place as a simple proxy adds a performance overhead. |
This pull request wraps the statements executed during startup inside a transaction.
This is important in pool-by-transaction scenarios in order to make sure that all the statements reaches the same connection that is being initialized. Since these are
SET
statements their changes persists outside the transaction scope, but we do receive theZ????T
messages from the server.This should only be considered for 42.3.