Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: isValid() should not wait longer than network timeout #2022

Closed
wants to merge 1 commit into from

Conversation

Powerrr
Copy link
Contributor

@Powerrr Powerrr commented Jan 13, 2021

Some time ago PR #1557 was merged into master and it introduced a kind of regression:

  • before the change: if we called setNetworkTimeout() with some small value (say, 500 ms) and then called isValid() with a larger timeout (say, 2 seconds) on an unresponsive connection, isValid() would respect the smaller timeout and would wait for only 500 ms;
  • after the change: in the same scenario as above isValid() would override the network timeout and wait for 2 seconds.

This is a problem for me, because at the company where I work we often use timeouts smaller than 1 second. We use HikariCP and it allows validation timeouts as small as 250 ms, which is implemented by smth. like conn.setNetworkTimeout(250); return conn.isValid(1); (see https://github.com/brettwooldridge/HikariCP/blob/HikariCP-3.4.5/src/main/java/com/zaxxer/hikari/pool/PoolBase.java#L156-L162).
And if we update to the latest PgJDBC version we'll be limited to validation timeouts that are multiples of 1 second.

As a solution I suggest to change isValid method so that it overrides the network timeout only in those cases where the new timeout is smaller than the one already set. So, for example, if you call setNetworkTimeout(500) and then call isValid(2), it would wait for at most 500 ms. And if you call setNetworkTimeout(3000) and then call isValid(1), it would wait for at most 1 second.
I wrote some code to show the proposed solution. I haven't added any tests, but I think I could try to (if this PR is otherwise considered good).

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Existing Features:

  • Does this break existing behaviour? If so please explain.
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully run tests with your changes locally?

@davecramer
Copy link
Member

Another grey area in the JDBC spec.

isValid is in seconds
networkTimeout is in ms,

There are also some interesting notes on setNetworkTimeout
This method is severe in it's effects, and should be given a high enough value so it is never triggered before any more normal timeouts, such as transaction timeouts.

It also states that the networkTimeout overrides all other timeouts.

setNetworkTimeout(null, timeout * 1000);
// change network timeout only if the new value is more restrictive than the old one
// (zero means no timeout)
if (newNetworkTimeout != 0 && (oldNetworkTimeout == 0 || oldNetworkTimeout > newNetworkTimeout)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the word restrictive can we change the comment to "the new value is less than the current"
Also it's easier for me to read new < old

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the new value is less than the current" is not exactly accurate because of the zero-timeout case, so I tried to come up with a more "creative" wording :) OK, I'll change that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just put a note that 0 is really no timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@Powerrr Powerrr force-pushed the isvalid-timeout branch 2 times, most recently from 462776f to e44872f Compare January 14, 2021 10:13
@davecramer
Copy link
Member

@Powerrr sorry I should have mentioned this earlier. Can you change this PR to be against https://github.com/pgjdbc/pgjdbc/tree/release/42.2. I will then cherry pick it into master.

Also any ideas on how to test this ?

@Powerrr
Copy link
Contributor Author

Powerrr commented Jan 15, 2021

I added a test, based on a similar test I found.
But turns out that test is present only in master, but not in release/42.2. And I refactored it a bit to avoid code duplication. Now I can't rebase my code on top of release/42.2 without conflicts. Maybe you could port the change that introduced the test (7e2cd55) to release/42.2? Or should I just duplicate the code?

And another thing: I had to disable ssl in my test, because apparently closing an ssl socket (after the timeout happens) invokes one more read from the socket, which adds an extra delay (since the socket is dead) and the test fails. This happens only on somewhat recent versions of java, maybe 11.0.something and newer and is probably worth looking into in a separate issue.

@davecramer
Copy link
Member

I need to think about whether to push this into release 42.2.x as this will be a change. @vlsi @sehrope thoughts ? Personally I think we are fixing a regression, however this will have "interesting" side effects. I'm also curious if we are adhering to the spec regarding killing all existing queries when the network times out.

@sehrope
Copy link
Member

sehrope commented Jan 15, 2021

I think it makes sense and follows the spec for the network timeout to supersede any parameter in the isValid(...) request.

The JDBC docs say as much. Check out the last line:

Sets the maximum period a Connection or objects created from the Connection will wait for the database to reply to any one request. If any request remains unanswered, the waiting method will return with a SQLException, and the Connection or objects created from the Connection will be marked as closed. Any subsequent use of the objects, with the exception of the close, isClosed or Connection.isValid methods, will result in a SQLException.

Note: This method is intended to address a rare but serious condition where network partitions can cause threads issuing JDBC calls to hang uninterruptedly in socket reads, until the OS TCP-TIMEOUT (typically 10 minutes). This method is related to the abort() method which provides an administrator thread a means to free any such threads in cases where the JDBC connection is accessible to the administrator thread. The setNetworkTimeout method will cover cases where there is no administrator thread, or it has no access to the connection. This method is severe in it's effects, and should be given a high enough value so it is never triggered before any more normal timeouts, such as transaction timeouts.

I'm okay with this fix going into 42.2.x as a fix. In the real world I don't see how this would break anything. All user SQL is going to be slower than the empty command used by isValid(...) so if their low network timeout break the isValid(...) then the user would not be able to do any real SQL commands anyway.

I took a peek at the PR and there's a potential integer overflow if a nonsensical huge timeout is specified. Though the existing code had the same issue. It's the JDBC's fault for having setNetworkTimeout(...) take an int instead of a long for the milliseconds...

Could fix both the overflow and this actual timeout issue with:

diff --git a/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java b/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java
index 7250cc1f..718ab525 100644
--- a/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java
+++ b/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java
@@ -1412,7 +1412,11 @@ public class PgConnection implements BaseConnection {
     try {
       int savedNetworkTimeOut = getNetworkTimeout();
       try {
-        setNetworkTimeout(null, timeout * 1000);
+        // change network timeout only if the new value is less than the current
+        // (zero means infinite timeout)
+        if (timeout != 0 && (savedNetworkTimeOut == 0 || timeout * 1000L < savedNetworkTimeOut)) {
+          setNetworkTimeout(null, (int) Math.min(timeout * 1000L, Integer.MAX_VALUE));
+        }
         if (replicationConnection) {
           Statement statement = createStatement();
           statement.execute("IDENTIFY_SYSTEM");

This resets the socket timeout on every call but the old code did too.

Should also add ConnectionValidTimeoutTest tests to an existing suite. I'm a bit confused how they're actually running in CI without being a part of one.

@Powerrr
Copy link
Contributor Author

Powerrr commented Jan 19, 2021

I fixed the integer overflow and rebased the code on top of release/42.2. There is one small problem: two of my tests won't pass on 42.2, because #1943 wasn't backported there, so the driver tries to cancel the validation query, which takes additional time.

@Powerrr Powerrr changed the base branch from master to release/42.2 January 19, 2021 22:04
@sehrope
Copy link
Member

sehrope commented Jan 24, 2021

I think the fix itself is good. The new test is a bit shaky though. I rewrote the test to use a new "StrangeProxyServer" that selectively stops forwarding older client traffic. This new test works even with the old cancellation code as the traffic for the cancellation request is allowed to go through.

Branch rebased on release/42.2 with the new test is here: https://github.com/sehrope/pgjdbc/commits/fix-is-valid-timeout

Here's the dif vs 42.2 with the helpers: https://github.com/pgjdbc/pgjdbc/compare/release/42.2..sehrope:fix-is-valid-timeout

If that looks good and passes CI then we'll get that merged in and can cherry-pick it for master as well.

@sehrope
Copy link
Member

sehrope commented Jan 26, 2021

@Powerrr This fix has been merged in #2022. It'll be a part of the next 42.2.x release. Thanks for the submission!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants