perf: Optimise the readonly (true / false) call #1228

benbenw · 2018-06-27T08:11:07Z

It's quite common to see applications doing :

setReadonly true
simple query
setReadonly false

Surprisingly the setReadonly call is costly (memory allocations / cpu) (from the driver perspective not the db)
You can mitigate this by activating preferQueryMode=extendedCacheEverything but it comes with its own problems :

performance see PR perf: remove unneeded cache memory allocations #1226
you end up caching all the statements not only the set readonly ones.

This comes from real world performance analysis.

For each call, the driver is calling

pgjdbc/pgjdbc/src/main/java/org/postgresql/core/Parser.java

Line 1057 in b7fd9f3

public static String replaceProcessing(String p_sql, boolean replaceProcessingEnabled,
- which is not needed for those queries
pgjdbc/pgjdbc/src/main/java/org/postgresql/core/Parser.java

Line 44 in b7fd9f3

public static List<NativeQuery> parseJdbcSql(String query, boolean standardConformingStrings,
- which should be reused

I think there is room for improvement.
I see at least 2 solutions :

force those queries to be cached (event if preferQueryMode is not set to extendedCacheEverything)
- make sure PR perf: remove unneeded cache memory allocations #1226 is merged

use a pre parse query and re-use it

something similar to the commit / rollback queries

see

pgjdbc/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java

Lines 287 to 291 in b7fd9f3

    
           // Initialize common queries. 
        
           // isParameterized==true so full parse is performed and the engine knows the query 
        
           // is not a compound query with ; inside, so it could use parse/bind/exec messages 
        
           commitQuery = createQuery("COMMIT", false, true).query; 
        
           rollbackQuery = createQuery("ROLLBACK", false, true).query;

The maintainers will probably have more ideas.

This analysis may apply to the other "internal queries" of the driver but we mostly have troubles with the "readonly" ones.

The text was updated successfully, but these errors were encountered:

vlsi · 2018-06-27T09:20:39Z

@benbenw , could you please create a JMH benchmark to highlight the issue and/or quantify the changes?

The benchmarks are located here: https://github.com/pgjdbc/benchmarks

benbenw · 2018-06-27T09:52:48Z

No ! 😟
This is an issue, not a pr.
I don't have any code associated, it's the result of a performance analysis of a real application (data & measurement can not be shared)

Moreover before coding anything, we should choose a solution, I've highlighted 2 possibilities but you may have other and hopefully better ideas.

vlsi · 2018-06-27T10:15:38Z

it's the result of a performance analysis

That is great

Moreover before coding anything

I agree, however "before coding anything" we should agree if the bug is in the driver or if the bug is in the application itself. Sample code helps with that.

We would need to quantify the change anyway, so a benchmark would be required here to merge the change to ensure we get improvement and we prevent a regression.

PS.
I think "pre parse query" is the way to go here.

On top of that it might make sense to pipeline "setReadOnly" calls with subsequent queries, so setReadOnly becomes basically a no-op and it just prepends the subsequent query with relevant "readonly" setting.

benbenw · 2018-06-27T11:42:52Z

I agree, however "before coding anything" we should agree if the bug is in the driver or if the bug is in the application itself.

There is no bug in the driver. It's a perf improvement.
While I agree that the driver should not be designed for badly coded applications, the pattern I described (readonly true, query, readonly) is common even in good quality apps where the dev doesn't manage the queries himself.

Here the driver could be better out-of-the-box, that's why I opened this issue.

davecramer · 2018-06-27T12:15:20Z

The reason we are asking for a benchmark isn't to dispute your findings it's to ensure that we don't introduce regressions later on.

davecramer · 2018-06-27T12:16:36Z

On top of that it might make sense to pipeline "setReadOnly" calls with subsequent queries, so setReadOnly becomes basically a no-op and it just prepends the subsequent query with relevant "readonly" setting.

I agree with this approach it seems minimally invasive

vlsi · 2018-06-27T12:59:31Z

I agree with this approach it seems minimally invasive

The sad thing is:

We have no to limited machinery for pipelining queries. This might require a bunch of if statements here and there
There's setReadOnly does not work with pgbouncer and transaction pooling mode #848 when pgbouncer users want setReadOnly to be transaction-specific. That is autocommit=off + readonly should result in a begin + set readonly fusion producing BEGIN TRANSACTION READ ONLY

It does sound cool, yet it is something to be developed.

bokken · 2018-07-12T15:22:51Z

@davecramer are you working on this? I was starting to take a look, but do not want to duplicate effort.

@vlsi,

That is autocommit=off + readonly should result in a begin + set readonly fusion producing BEGIN TRANSACTION READ ONLY

Does autocommit really matter? On brief look it appears that even if autocommit is true the result would be:

BEGIN READ ONLY
<some work>
COMMIT

That seems like it should be fine, right?

vlsi · 2018-07-12T15:43:23Z

On brief look it appears that even if autocommit is true the result would be:

The thing is in autoCommit=true, the driver issues NO BEGIN statement at the moment AFAIK

Using explicit begin readonly around each statement in a readonly mode might create noticeable processing overhead at backend side. It would result in 3 messages (begin + select + commit) instead of 1 (just select)

BEGIN READ ONLY
<some work>
COMMIT

davecramer · 2018-07-12T15:45:27Z

@bokken no, I'm not working on this, so go ahead.

as for "that seems like it should be fine" I don't agree. By wrapping a read only connection in a transaction we are changing the transaction isolation.

After the BEGIN all subsequent selects are executed within this transaction, and will be subject to the transaction isolation semantics. While this may seem innocuous it is certainly an unintended consequence of setReadOnly. Given that setReadOnly is meant to be a hint to the driver I'm not comfortable with this approach. To complicate matters further it is actually possible to write to temporary tables in this mode.

I'd be more interested in an optimization that somehow (I have no idea at the moment) found a read only connection from the possible connections and used that.

vlsi · 2018-07-12T15:52:06Z

Given that setReadOnly is meant to be a hint to the driver I'm not comfortable with this approach

Then valid "improvement" would be just rip off "SET SESSION CHARACTERISTICS AS TRANSACTION " + (readOnly ? "READ ONLY" : "READ WRITE" query.
Why do we have it at all?

bokken · 2018-07-12T15:56:08Z

The thing is in autoCommit=true, the driver issues NO BEGIN statement at the moment AFAIK

You are correct. The key piece is in PgStatement (and likely other places) where if autoCommit is true, the flag QUERY_SUPPRESS_BEGIN is passed in.
It appears that perhaps another flag of QUERY_BEGIN_READ_ONLY could be added that only flexes if BEGIN is being called at all.

vlsi · 2018-07-12T16:09:06Z

I suggest the following:

rip off SET SESSION CHARACTERISTICS AS TRANSACTION completely (or hide it behind a non-default flag). Just use a per-connection field to remember readonly state
when in readonly==true, then issue BEGIN READONLY kind of query when starting a transaction (in autocommit==false mode)
add a non-default property for "force begin readonly/commit in autocommit mode". It would add processing overhead, however it would enable to load-balance readonly requests via pgbouncer kind of thing that uses begin readonly hint to ensure the query can be served by a replica.

Any thoughts?

bokken · 2018-07-12T16:10:40Z

rip off SET SESSION CHARACTERISTICS AS TRANSACTION completely (or hide it behind a non-default

+1

Just use a per-connection field to remember readonly state

We already have that.

when in readonly==true, then issue BEGIN READONLY kind of query in a transaction

Looks like this might be pretty simple. Working on a prototype now.

add a property for "force begin readonly/commit in autocommit mode". It would add processing overhead, however it would enable to load-balance readonly requests via pgbouncer kind of thing that uses begin readonly hint to ensure the query can be served by a replica.

Just to clarify, the goal is to handle when both read only and auto commit are true?
Are we certain anyone has that use case? The specific folks asking for this all seem to care about transactions (i.e. auto commit false). Perhaps the property is to define whether read only is set for session (current behavior) or transaction (newly defined behavior).

davecramer · 2018-07-12T16:26:26Z

Given that this does not actually provide any performance benefit unless we can route queries to a read only host I would suggest the best optimization is to make it a no-op.

vlsi · 2018-07-12T16:27:23Z

Are we certain anyone has that use case? The specific folks asking for this all seem to care about transactions (i.e. auto commit false). Perhaps the property is to define whether read only is set for session (current behavior) or transaction (newly defined behavior).

The case there is as follows: the app knows it is about to execute a readonly query and it wants to hint the downstream on that, so the downstream uses REPLICA instead of master DB.

AFAIK the only way for pgbouncer can tell readonly from a readwrite query is via begin readonly.

However, that ("force begin readonly") might come as a separate improvement.

vlsi · 2018-07-12T16:30:22Z

Given that this does not actually provide any performance benefit unless we can route queries to a read only host I would suggest the best optimization is to make it a no-op.

There's a relevant StackOverflow question on quorum based synchronous replication, targetServerType, and picking the best server

davecramer · 2018-07-12T16:36:56Z

Interesting. Figuring out which server has the lowest lag would introduce quite a big of overhead in picking the connection.

Back to pgbouncer. It seems currently the only legitimate use case is to give a hint to pgbouncer. As this is only useful if the autocommit=false this is quite limiting. In which case I'd be in favour of the non-default setting to force begin readonly. Ideally I'd like to make this more generic for all of the other pools but that's not our problem at the moment

bokken · 2018-07-12T16:37:02Z

The case there is as follows: the app knows it is about to execute a readonly query and it wants to hint the downstream on that, so the downstream uses REPLICA instead of master DB.

AFAIK the only way for pgbouncer can tell readonly from a readwrite query is via begin readonly.

I agree with all of that. The discussions over on #848 all still show a transaction being used (i.e. autocommit set to false).

If autocommit is set to false, read only will be set on begin transaction. If autocommit is true, it will continue to be managed at session level. The queries to change session have been cached to avoid re-parsing each time readonly value changes. pgjdbc#1228 pgjdbc#848

* feat: read only transactions If autocommit is set to false, read only will be set on begin transaction. If autocommit is true, it will continue to be managed at session level. The queries to change session have been cached to avoid re-parsing each time readonly value changes. #1228 #848 * feat: read only transactions * checkstyle and hamcrest test import * add connection property with 3 options to control read only behavior * fix missing property methods on BaseDataSource * avoid redundant static modifier * more loosely couple read only hints to backend * return default read only mode from data source. * avoid case conversion

redbaron · 2020-04-03T10:27:32Z

@davecramer ,

Given that this does not actually provide any performance benefit
It seems currently the only legitimate use case is to give a hint to pgbouncer

read only transactions help when Serializable isolation level is used. Quoting docs:

A READ ONLY transaction may be able to release its SIRead locks before completion, if it detects that no conflicts can still occur which could lead to a serialization anomaly. In fact, READ ONLY transactions will often be able to establish that fact at startup and avoid taking any predicate locks.

davecramer · 2020-04-03T10:41:08Z

@redbaron So the very first thing we asked the original reporter for was a benchmark in order to measure the possible performance benefit. It would help us immensely if we had such a benchmark to actually test and prove that any changes we made actually made a difference. Care to put one together ?

benbenw mentioned this issue Jul 2, 2018

Does the driver support routing of read only queries to Read replicas ? #1237

Closed

davecramer added this to In progress in 42.2.4 Release Jul 11, 2018

davecramer moved this from In progress to To do in 42.2.4 Release Jul 11, 2018

vlsi added this to To do in 42.3.0 Release via automation Jul 14, 2018

vlsi removed this from To do in 42.2.4 Release Jul 14, 2018

vlsi modified the milestones: 42.2.4, 42.3.0 Jul 14, 2018

bokken mentioned this issue Jul 14, 2018

feat: read only transactions #1252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimise the readonly (true / false) call #1228

perf: Optimise the readonly (true / false) call #1228

benbenw commented Jun 27, 2018

vlsi commented Jun 27, 2018

benbenw commented Jun 27, 2018

vlsi commented Jun 27, 2018

benbenw commented Jun 27, 2018

davecramer commented Jun 27, 2018

davecramer commented Jun 27, 2018

vlsi commented Jun 27, 2018

bokken commented Jul 12, 2018

vlsi commented Jul 12, 2018

davecramer commented Jul 12, 2018

vlsi commented Jul 12, 2018

bokken commented Jul 12, 2018

vlsi commented Jul 12, 2018 •

edited

bokken commented Jul 12, 2018 •

edited

davecramer commented Jul 12, 2018

vlsi commented Jul 12, 2018

vlsi commented Jul 12, 2018

davecramer commented Jul 12, 2018

bokken commented Jul 12, 2018

redbaron commented Apr 3, 2020

davecramer commented Apr 3, 2020

perf: Optimise the readonly (true / false) call #1228

perf: Optimise the readonly (true / false) call #1228

Comments

benbenw commented Jun 27, 2018

vlsi commented Jun 27, 2018

benbenw commented Jun 27, 2018

vlsi commented Jun 27, 2018

benbenw commented Jun 27, 2018

davecramer commented Jun 27, 2018

davecramer commented Jun 27, 2018

vlsi commented Jun 27, 2018

bokken commented Jul 12, 2018

vlsi commented Jul 12, 2018

davecramer commented Jul 12, 2018

vlsi commented Jul 12, 2018

bokken commented Jul 12, 2018

vlsi commented Jul 12, 2018 • edited

bokken commented Jul 12, 2018 • edited

davecramer commented Jul 12, 2018

vlsi commented Jul 12, 2018

vlsi commented Jul 12, 2018

davecramer commented Jul 12, 2018

bokken commented Jul 12, 2018

redbaron commented Apr 3, 2020

davecramer commented Apr 3, 2020

vlsi commented Jul 12, 2018 •

edited

bokken commented Jul 12, 2018 •

edited