Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Optimise the readonly (true / false) call #1228

Open
benbenw opened this issue Jun 27, 2018 · 21 comments
Open

perf: Optimise the readonly (true / false) call #1228

benbenw opened this issue Jun 27, 2018 · 21 comments
Milestone

Comments

@benbenw
Copy link
Contributor

benbenw commented Jun 27, 2018

It's quite common to see applications doing :

  • setReadonly true
  • simple query
  • setReadonly false

Surprisingly the setReadonly call is costly (memory allocations / cpu) (from the driver perspective not the db)
You can mitigate this by activating preferQueryMode=extendedCacheEverything but it comes with its own problems :

This comes from real world performance analysis.

For each call, the driver is calling

I think there is room for improvement.
I see at least 2 solutions :

  • force those queries to be cached (event if preferQueryMode is not set to extendedCacheEverything)
  • use a pre parse query and re-use it
    • something similar to the commit / rollback queries
    • see
      // Initialize common queries.
      // isParameterized==true so full parse is performed and the engine knows the query
      // is not a compound query with ; inside, so it could use parse/bind/exec messages
      commitQuery = createQuery("COMMIT", false, true).query;
      rollbackQuery = createQuery("ROLLBACK", false, true).query;

The maintainers will probably have more ideas.

This analysis may apply to the other "internal queries" of the driver but we mostly have troubles with the "readonly" ones.

@vlsi
Copy link
Member

vlsi commented Jun 27, 2018

@benbenw , could you please create a JMH benchmark to highlight the issue and/or quantify the changes?

The benchmarks are located here: https://github.com/pgjdbc/benchmarks

@benbenw
Copy link
Contributor Author

benbenw commented Jun 27, 2018

No ! 😟
This is an issue, not a pr.
I don't have any code associated, it's the result of a performance analysis of a real application (data & measurement can not be shared)

Moreover before coding anything, we should choose a solution, I've highlighted 2 possibilities but you may have other and hopefully better ideas.

@vlsi
Copy link
Member

vlsi commented Jun 27, 2018

it's the result of a performance analysis

That is great

Moreover before coding anything

I agree, however "before coding anything" we should agree if the bug is in the driver or if the bug is in the application itself. Sample code helps with that.

We would need to quantify the change anyway, so a benchmark would be required here to merge the change to ensure we get improvement and we prevent a regression.

PS.
I think "pre parse query" is the way to go here.

On top of that it might make sense to pipeline "setReadOnly" calls with subsequent queries, so setReadOnly becomes basically a no-op and it just prepends the subsequent query with relevant "readonly" setting.

@benbenw
Copy link
Contributor Author

benbenw commented Jun 27, 2018

I agree, however "before coding anything" we should agree if the bug is in the driver or if the bug is in the application itself.

There is no bug in the driver. It's a perf improvement.
While I agree that the driver should not be designed for badly coded applications, the pattern I described (readonly true, query, readonly) is common even in good quality apps where the dev doesn't manage the queries himself.

Here the driver could be better out-of-the-box, that's why I opened this issue.

@davecramer
Copy link
Member

The reason we are asking for a benchmark isn't to dispute your findings it's to ensure that we don't introduce regressions later on.

@davecramer
Copy link
Member

On top of that it might make sense to pipeline "setReadOnly" calls with subsequent queries, so setReadOnly becomes basically a no-op and it just prepends the subsequent query with relevant "readonly" setting.

I agree with this approach it seems minimally invasive

@vlsi
Copy link
Member

vlsi commented Jun 27, 2018

I agree with this approach it seems minimally invasive

The sad thing is:

  1. We have no to limited machinery for pipelining queries. This might require a bunch of if statements here and there
  2. There's setReadOnly does not work with pgbouncer and transaction pooling mode #848 when pgbouncer users want setReadOnly to be transaction-specific. That is autocommit=off + readonly should result in a begin + set readonly fusion producing BEGIN TRANSACTION READ ONLY

It does sound cool, yet it is something to be developed.

@bokken
Copy link
Member

bokken commented Jul 12, 2018

@davecramer are you working on this? I was starting to take a look, but do not want to duplicate effort.

@vlsi,

That is autocommit=off + readonly should result in a begin + set readonly fusion producing BEGIN TRANSACTION READ ONLY

Does autocommit really matter? On brief look it appears that even if autocommit is true the result would be:

BEGIN READ ONLY
<some work>
COMMIT

That seems like it should be fine, right?

@vlsi
Copy link
Member

vlsi commented Jul 12, 2018

On brief look it appears that even if autocommit is true the result would be:

The thing is in autoCommit=true, the driver issues NO BEGIN statement at the moment AFAIK

Using explicit begin readonly around each statement in a readonly mode might create noticeable processing overhead at backend side. It would result in 3 messages (begin + select + commit) instead of 1 (just select)

BEGIN READ ONLY
<some work>
COMMIT

@davecramer
Copy link
Member

@bokken no, I'm not working on this, so go ahead.

as for "that seems like it should be fine" I don't agree. By wrapping a read only connection in a transaction we are changing the transaction isolation.

After the BEGIN all subsequent selects are executed within this transaction, and will be subject to the transaction isolation semantics. While this may seem innocuous it is certainly an unintended consequence of setReadOnly. Given that setReadOnly is meant to be a hint to the driver I'm not comfortable with this approach. To complicate matters further it is actually possible to write to temporary tables in this mode.

I'd be more interested in an optimization that somehow (I have no idea at the moment) found a read only connection from the possible connections and used that.

@vlsi
Copy link
Member

vlsi commented Jul 12, 2018

Given that setReadOnly is meant to be a hint to the driver I'm not comfortable with this approach

Then valid "improvement" would be just rip off "SET SESSION CHARACTERISTICS AS TRANSACTION " + (readOnly ? "READ ONLY" : "READ WRITE" query.
Why do we have it at all?

@bokken
Copy link
Member

bokken commented Jul 12, 2018

The thing is in autoCommit=true, the driver issues NO BEGIN statement at the moment AFAIK

You are correct. The key piece is in PgStatement (and likely other places) where if autoCommit is true, the flag QUERY_SUPPRESS_BEGIN is passed in.
It appears that perhaps another flag of QUERY_BEGIN_READ_ONLY could be added that only flexes if BEGIN is being called at all.

@vlsi
Copy link
Member

vlsi commented Jul 12, 2018

I suggest the following:

  1. rip off SET SESSION CHARACTERISTICS AS TRANSACTION completely (or hide it behind a non-default flag). Just use a per-connection field to remember readonly state
  2. when in readonly==true, then issue BEGIN READONLY kind of query when starting a transaction (in autocommit==false mode)
  3. add a non-default property for "force begin readonly/commit in autocommit mode". It would add processing overhead, however it would enable to load-balance readonly requests via pgbouncer kind of thing that uses begin readonly hint to ensure the query can be served by a replica.

Any thoughts?

@bokken
Copy link
Member

bokken commented Jul 12, 2018

rip off SET SESSION CHARACTERISTICS AS TRANSACTION completely (or hide it behind a non-default

+1

Just use a per-connection field to remember readonly state

We already have that.

when in readonly==true, then issue BEGIN READONLY kind of query in a transaction

Looks like this might be pretty simple. Working on a prototype now.

add a property for "force begin readonly/commit in autocommit mode". It would add processing overhead, however it would enable to load-balance readonly requests via pgbouncer kind of thing that uses begin readonly hint to ensure the query can be served by a replica.

Just to clarify, the goal is to handle when both read only and auto commit are true?
Are we certain anyone has that use case? The specific folks asking for this all seem to care about transactions (i.e. auto commit false). Perhaps the property is to define whether read only is set for session (current behavior) or transaction (newly defined behavior).

@davecramer
Copy link
Member

Given that this does not actually provide any performance benefit unless we can route queries to a read only host I would suggest the best optimization is to make it a no-op.

@vlsi
Copy link
Member

vlsi commented Jul 12, 2018

Are we certain anyone has that use case? The specific folks asking for this all seem to care about transactions (i.e. auto commit false). Perhaps the property is to define whether read only is set for session (current behavior) or transaction (newly defined behavior).

The case there is as follows: the app knows it is about to execute a readonly query and it wants to hint the downstream on that, so the downstream uses REPLICA instead of master DB.

AFAIK the only way for pgbouncer can tell readonly from a readwrite query is via begin readonly.

However, that ("force begin readonly") might come as a separate improvement.

@vlsi
Copy link
Member

vlsi commented Jul 12, 2018

Given that this does not actually provide any performance benefit unless we can route queries to a read only host I would suggest the best optimization is to make it a no-op.

There's a relevant StackOverflow question on quorum based synchronous replication, targetServerType, and picking the best server

@davecramer
Copy link
Member

Interesting. Figuring out which server has the lowest lag would introduce quite a big of overhead in picking the connection.

Back to pgbouncer. It seems currently the only legitimate use case is to give a hint to pgbouncer. As this is only useful if the autocommit=false this is quite limiting. In which case I'd be in favour of the non-default setting to force begin readonly. Ideally I'd like to make this more generic for all of the other pools but that's not our problem at the moment

@bokken
Copy link
Member

bokken commented Jul 12, 2018

The case there is as follows: the app knows it is about to execute a readonly query and it wants to hint the downstream on that, so the downstream uses REPLICA instead of master DB.

AFAIK the only way for pgbouncer can tell readonly from a readwrite query is via begin readonly.

I agree with all of that. The discussions over on #848 all still show a transaction being used (i.e. autocommit set to false).

@vlsi vlsi added this to To do in 42.3.0 Release via automation Jul 14, 2018
@vlsi vlsi removed this from To do in 42.2.4 Release Jul 14, 2018
@vlsi vlsi modified the milestones: 42.2.4, 42.3.0 Jul 14, 2018
bokken added a commit to bokken/pgjdbc that referenced this issue Jul 14, 2018
If autocommit is set to false, read only will be set on begin
transaction.
If autocommit is true, it will continue to be managed at session level.
The queries to change session have been cached to avoid re-parsing each
time readonly value changes.

pgjdbc#1228
pgjdbc#848
davecramer pushed a commit that referenced this issue Nov 25, 2019
* feat: read only transactions

If autocommit is set to false, read only will be set on begin
transaction.
If autocommit is true, it will continue to be managed at session level.
The queries to change session have been cached to avoid re-parsing each
time readonly value changes.

#1228
#848

* feat: read only transactions

* checkstyle and hamcrest test import

* add connection property with 3 options to control read only behavior

* fix missing property methods on BaseDataSource 
* avoid redundant static modifier

* more loosely couple read only hints to backend

* return default read only mode from data source.

* avoid case conversion
@redbaron
Copy link

redbaron commented Apr 3, 2020

@davecramer ,

Given that this does not actually provide any performance benefit
It seems currently the only legitimate use case is to give a hint to pgbouncer

read only transactions help when Serializable isolation level is used. Quoting docs:

A READ ONLY transaction may be able to release its SIRead locks before completion, if it detects that no conflicts can still occur which could lead to a serialization anomaly. In fact, READ ONLY transactions will often be able to establish that fact at startup and avoid taking any predicate locks.

@davecramer
Copy link
Member

@redbaron So the very first thing we asked the original reporter for was a benchmark in order to measure the possible performance benefit. It would help us immensely if we had such a benchmark to actually test and prove that any changes we made actually made a difference. Care to put one together ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
42.3.0 Release
  
To do
Development

No branches or pull requests

5 participants