Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds max_consecutive_exceptions to pool #253

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

egalpin
Copy link

@egalpin egalpin commented Feb 2, 2018

This change might not be for everyone, so I respect the right of the core maintainers to downvote/close.

The backstory here is that I am using asyncpg to connect to an HA postgres cluster. In the event of a failover, the writer (where the asyncpg connection pool is connected to) crashes/fails, and is rebooted. The reader is then promoted to writer. The DB DNS is automatically updated, but this takes time. In addition, the connections in the pool continue operating once the crashed DB recovers, except now that DB is the reader (i.e. SHOW transaction_read_only; -> on). As a result, INSERT/UPDATE/DELETE operations result in (among other various connection errors):

asyncpg.exceptions.ReadOnlySQLTransactionError: cannot execute INSERT in a read-only transaction

Other errors, like timeouts, are also possible offenders. Unfortunately, the only way I've found around this to refresh the connections is to set a low max_queries config param. This is generally ok, but degrades performance due to increased cycles of closing and opening new connections.

With this PR, a configurable max_consecutive_exceptions pool param is introduced. This param is checked against every time a pool executes a connection method (fetch, execute, etc) and itresults in an appropriate exception. It's important to note that this PR only manages the consecutive exception state via context, not direct con = await pool.acquire() (in which case it's up to the user to handle).

Supersedes #249

@elprans
Copy link
Member

elprans commented Jun 10, 2018

0.16.0 has Pool.expire_connections() which you can call on the failover event. Would that fix your use case?

@egalpin
Copy link
Author

egalpin commented Jun 11, 2018

While I think this could make handling the full recycling of the connection pool much simpler (just a function call now), this unfortunately doesn't provide full functionality.

... you can call [expire_connections] on the failover event

That's just the trouble. There isn't a broadcast of a particular special exception or error that indicates the a failover has occurred. What we see is a series of various connection errors, the same as one might occasionally see otherwise and chalk up as transient network instability. This occurs because the failover event results in a DNS update that is relied upon to allow clients to continue using, for example, my_prod_db.foo.com:5432 even though the underlying DB resource has changed entirely.

@egalpin
Copy link
Author

egalpin commented Jun 11, 2018

I'd be more than happy to update the PR to include usage of expire_connections, though

@elprans
Copy link
Member

elprans commented Jun 11, 2018

I'm somewhat reluctant to add such policies to the base pool, as everyone's requirements are different.
I think making Pool.get_new_connection() public would allow implementing this policy in a subclass cleanly.

@egalpin
Copy link
Author

egalpin commented Jun 11, 2018

I'm definitely open to a subclass model and can appreciate the reasons for leaving this additional complexity out of the base Pool class. Could you elaborate on your suggestion a bit further? I'm not seeing the function get_new_connection anywhere, but perhaps you're suggesting I write that function? Thanks!

@elprans
Copy link
Member

elprans commented Jun 11, 2018

There's the private Pool._get_new_connection(), which we can make public.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants