New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loginTimeout is not followed under network partition and clock jump #1468
Comments
Are there other timeouts (outside the Driver class) also using currentTimeMillis? |
Just happened by this one pgjdbc/pgjdbc/src/main/java/org/postgresql/core/v3/replication/V3PGReplicationStream.java Line 59 in 381cf45
That said I think fixing that should be done in another PR |
@bokken @davecramer Thanks for the reply! I am also in favor of using |
…sl handshake Summary: Details of the issue can be found in: pgjdbc#1468 PR: pgjdbc#1469 Test Plan: Run manual tests with and without the fix. Verify that the fix did prevent the issue from happening. Reviewers: #callisto, garvit Reviewed By: #callisto, garvit Tags: #callisto JIRA Issues: CDM-110595 Differential Revision: https://phabricator.rubrik.com/D87384
hmmm apparently we still have some stragglers besides tests that use currentTimeMillis |
fixed by #1617 |
I'm submitting a ...
Describe the issue
The driver does not follow the login timeout when there is clock jump on client and network partition between client and server. We have set
loginTimeout
to 10 seconds andsocketTimeout
to 20 seconds, but we have seen some queries hanging for more than 1 hour.Driver Version?
42.0.0
Java Version?
1.8.0_191
OS Version?
Ubuntu 16.04
PostgreSQL Version?
11.2
Analysis
There are two bugs that together cause the issue
Use of non-monotonic time
System.currentTimeMillis()
https://github.com/pgjdbc/pgjdbc/blob/REL42.0.0/pgjdbc/src/main/java/org/postgresql/Driver.java#L364-L412
Login timeout is calculated with
System.currentTimeMillis()
, which is not monotonic. If there is a backward clockjump after the calculation ofexpiry
and before the calculation ofdelay
, it is possible thatdelay
has a very large value, which is then passed towait()
method (on line 400).This
wait()
method will wait until either another thread invokes thenotify()
method or thenotifyAll()
method for this object, or a specified amount of time has elapsed.No socket timeout set for socket read in
enableSSL
When making a connection, the separate thread will call
ConnectionFactoryImpl.openConnectionImpl
to open a connection. Inside the function, it callsenableSSL
to send ssl startup packet and enable ssl for an established connection. However, this step does not have socket timeout set, which means it can get stuck forever.How these two bugs cause the issue
Due to the first bug, we have a very long time to wait inside
wait()
method. Ideally, theConnectThread.run()
will either make a connection or fail ifsocketTimeout
andconnectTimeout
are working as expected. However, the second bug causesmakeConnection
inConnectThread.run
to be stuck forever, which means the calling thread has to wait until the longdelay
has elapsed.To Reproduce
enableSSL
wait on the socket forever. This can be done by adding an iptables rule that drops all packets to the postgresql port.Please note that you need to drop all packets after the connection is established and before
enableSSL
is called. (one way is to add sleep beforeenableSSL
) Otherwise, you will see Connection timeout error because a connection cannot be established.Expected behaviour
The query fails after loginTimeout has elapsed.
The text was updated successfully, but these errors were encountered: