Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group startup statements #1977

Merged
merged 1 commit into from
Mar 16, 2021
Merged

Group startup statements #1977

merged 1 commit into from
Mar 16, 2021

Conversation

jesperpedersen
Copy link
Contributor

This pull request wraps the statements executed during startup inside a transaction.

This is important in pool-by-transaction scenarios in order to make sure that all the statements reaches the same connection that is being initialized. Since these are SET statements their changes persists outside the transaction scope, but we do receive the Z????T messages from the server.

This should only be considered for 42.3.

@sehrope
Copy link
Member

sehrope commented Dec 10, 2020

Rather than wrap it in a transaction, how about combining all the statements into a single SQL command separated by semicolons? That'd be one less round trip too.

Though I wonder if any of this is necessary. If you set the ASSUME_MIN_SERVER_VERSION property to anything that's not ancient, then this code won't execute anyway. For servers known to be greater than 9.0 it will send the properties in the startup packet.

@jesperpedersen
Copy link
Contributor Author

Multiple statements with semicolons are multiple transactions.

You can't assume that people will deploy with ASSUME_MIN_SERVER_VERSION, and there are two statements in that >= 9.0 block

@sehrope
Copy link
Member

sehrope commented Dec 10, 2020

Does any pooling software do it's own SQL parsing and splitting of user commands? I figured anything that gets sent with the simple protocol would go as a single simple query message and end up at the actual server in one piece as well.

I've opened a separate issue to discuss bumping the default for ASSUME_MIN_SERVER_VERSION: #1978

@jesperpedersen
Copy link
Contributor Author

Yes, pgpool does SQL parsing, but doesn't support transaction pooling -- pgagroal only looks at the message header when running in transaction mode.

A simple query using semicolons is still multiple transactions.

By bumping ASSUME_MIN_SERVER_VERSION to 9.0 then you remove support for older servers, and for limited amount of code benefit.

@sehrope
Copy link
Member

sehrope commented Dec 10, 2020

I think there's still a race condition with transactional pooling. Say you have:

  1. Client connects JDBC-001 to pool which create PHYSICAL-001
  2. JDBC-001 sends BEGIN, gets assigned to PHYSICAL-001, then does SET / SET / ... / COMMIT which all gets executed on PHYSICAL-001
  3. In parallel, client connects JDBC-002 to pool which created PHYSICAL-002
  4. While PHYSICAL-002 is being connected, JDBC-001 finishes and ends it's transactions (so PHYSICAL-001 is available for assignment).
  5. JDBC-002 sends BEGIN, gets assigned to PHYSICAL-001 (not 002!) and then does then SET / SET / ... / COMMIT on PHYSICAL-001

At this point PHYSICAL-002 could be assigned by the pooler but it won't have any of the initialization code executed. Short of sending it in the startup packet I don't think this completely fixes things. It would just ensure that all or nothing is executed on the same physical connection.

Also, looks like there's at least one other non-transactional SQL command optionally sent (if READ_ONLY is enabled) in connection init: https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java#L236-L238

@jesperpedersen
Copy link
Contributor Author

There are a lot of caveats when using transaction pooling. The above is solved by using prefill of connections - like you normally would - as they don't reuse an existing connection.

@bokken
Copy link
Member

bokken commented Dec 13, 2020

Postgresql 9.0 was released in 2010. It seems reasonable to drop support for 8.x after 10 years.

@jesperpedersen
Copy link
Contributor Author

@davecramer @vlsi Any thoughts on the >= 9.0 change should be part of this pull request, or a separate one ?

My feeling it should be separate, but I can include it if needed.

@davecramer
Copy link
Member

Sorry for the late response. I've been relocating for the winter.

The >=9.0 should be a separate PR.

@bokken
Copy link
Member

bokken commented Dec 30, 2020

@jesperpedersen, #2006 removes support for server versions prior to 9.0. I think we are going to try and get it into the 42.3 release.

@davecramer
Copy link
Member

@jesperpedersen where are we with this?

@jesperpedersen
Copy link
Contributor Author

@davecramer davecramer merged commit 00774d1 into pgjdbc:master Mar 16, 2021
@davecramer
Copy link
Member

A simple query using semicolons is still multiple transactions.

Actually a simple query using semicolons is implicitly wrapped with a begin/end

"insert into foo (column1) values (1); rollback; select * from foo;"

will rollback the insert

@hannibal218bc
Copy link

An issue with this feature in connection with pgbouncer appeared and is being discussed in #2423 .

What I'm wondering about: how does this "wrapping" of startup-parameters help with transaction-pooling? As the transaction completes after startup, it is still quite possible that a subsequent transaction will hit a different backend, which might have been "initialized" with different parameters or not at all.

How is this PR helping with that? I understand that it at least keeps the parameters together, but as far as I understand the case, there are no guarantees that the work will be executed with the desired parameters.

Furthermore, wouldn't it be far more efficient to have the connection pooler initialize each backend (once, at the initial connect/startup) with the desired parameters, instead of having the clients "randomly" fire init commands at some backend?

@jesperpedersen
Copy link
Contributor Author

@hannibal218bc These statements are executed as part of the connection startup - e.g. each JDBC connection.

Each driver (JDBC, C, C#, Rust, ... ) can choose to initialize its startup parameters differently - so each driver needs to be vetted against a proper transaction semantics for each pooling mode.

Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present.

@davecramer
Copy link
Member

@jesperpedersen so does pgbouncer do anything special with the startup parameters in statement mode to ensure that they are all executed on the same connection?
Seems that is where this should be fixed.

@jesperpedersen
Copy link
Contributor Author

@davecramer Well, then "statement pooling" would need to look at the transactional state of the connection thereby "upgrading" the pooling to a "transaction pool" instead. So, would say no to that - having a false in JDBC is better.

There is a small "window" where "statement pooling" creates a benefit for the overall architecture, but it is limited to auto-commit and SELECT only use-cases.

@hannibal218bc
Copy link

@jesperpedersen :

Each driver (JDBC, C, C#, Rust, ... ) can choose to initialize its startup parameters differently - so each driver needs to be vetted against a proper transaction semantics for each pooling mode.

Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present.

Exactly, but isn't that another point in favor of @davecramer 's comment that the pool itself would be a better place to set startup parameters, rather than the JDBC driver?
Especially considering the fact(?) that in transaction mode, there is no guarantee that the work transaction will hit a properly initialized backend, as those are two distinct transactions and may be scheduled independently?

@hannibal218bc
Copy link

Or to put it differently: isn't the use of startup-parameters only feasible in session-level pooling, and not at transaction level?

@jesperpedersen
Copy link
Contributor Author

@hannibal218bc The pool can choose to set startup parameters for all prefilled connections. However, a driver can choose to assume certain settings being active.

Then the pool would need to look at the startup packet to check which prefilled connection "fits" with what driver, and then you run into the trouble of saying "I want 'A' JDBC connections for USER 'B' in DATABASE 'C', and 'D' C# connections for USER 'E' in DATABASE 'F' and so on in the pool configuration.

Ideally, all drivers should not assume anything - and work with the settings returned from the database cluster.

@jesperpedersen
Copy link
Contributor Author

And, transaction pooling will hit the same connection -- the connection is locked for the duration of Z.

@vlsi
Copy link
Member

vlsi commented Jan 28, 2022

Ideally, all drivers should not assume anything - and work with the settings returned from the database cluster.

@jesperpedersen , the driver would need to set session timezone, client_encoding anyway. You can't expect the server to magically know client's timezone which is important for queries that convert timestamps to strings.

On top of that, there are configuration properties that are not reported from the default PostgreSQL backend.
Of course, we would love to get more notifications from the backend (and from the poolers!), however, we have what we have.

For instance, see Proposal to add GUC_REPORT to lc_monetary, lc_numeric and search_path thread on pgsql-hackers.

The bad thing is that authentication failure does not contain message encoding, so the driver has to guess which encoding did server use for composing "authentication failed" error message:

* Receives a fixed-size string from the backend, and tries to avoid "UTF-8 decode failed"
* errors.
*
* @param len the length of the string to receive, in bytes.
* @return the decoded string
* @throws IOException if something wrong happens
*/
public EncodingPredictor.DecodeResult receiveErrorString(int len) throws IOException {
if (!pgInput.ensureBytes(len)) {
throw new EOFException();
}
EncodingPredictor.DecodeResult res;
try {
String value = encoding.decode(pgInput.getBuffer(), pgInput.getIndex(), len);
// no autodetect warning as the message was converted on its own
res = new EncodingPredictor.DecodeResult(value, null);
} catch (IOException e) {
res = EncodingPredictor.decode(pgInput.getBuffer(), pgInput.getIndex(), len);
if (res == null) {
Encoding enc = Encoding.defaultEncoding();
String value = enc.decode(pgInput.getBuffer(), pgInput.getIndex(), len);
res = new EncodingPredictor.DecodeResult(value, enc.name());
}
}
pgInput.skip(len);
return res;

That is why the first thing we want is to configure encoding=utf-8 to avoid users failing into "can't decode response from the server due to utf-8 decoding failed at byte ...".

Connection pools won't execute startup parameters - unless explicit told to do so - so you can't assume certain properties are present.

If the connection pooler discards the client's messages, then it sounds like a very unfavourable "feature".

Then the pool would need to look at the startup packet to check which prefilled connection

I would say the connection pool should monitor the queries passing by, and it should track which options are desired for each client. Then the pooler should pretend the client gets a connection with the same features.

The pooler should be able to recreate or re-configure one of its existing connections to the set of features by re-issuing set... requests on behalf of the client (or ignoring them if the pooler is sure the setting is already configured).

The pooler should track prepared statements as well, and it should re-prepare them (or remap the statement name!) on behalf of the clients if the pooler switches to a different physical connection.

@jesperpedersen
Copy link
Contributor Author

@vlsi I agree it will be quite a long time before settings are unified between drivers and controlled from the database cluster.

Having a connection pool parse and perhaps replay messages requires an overhead in the communication so it is def a feature that needs an on/off switch in the configuration. Just having a connection pool in place as a simple proxy adds a performance overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants