Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG::ConnectionBad: connection to server at "::1", when upgrading from 1.2.3 -> 1.3.1 #452

Closed
jsnwesson opened this issue Apr 28, 2022 · 5 comments · Fixed by #459
Closed

Comments

@jsnwesson
Copy link

An issue that we've come across when upgrading from 1.2.3 to 1.3.1 is that for some people in my team, when trying to query into the production database (that we tunnel into), they receive this error:

ActiveRecord::ConnectionNotEstablished: connection to server at "::1", port 9456 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

from /Users/jason/.rbenv/versions/2.7.5/lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.4/lib/active_record/connection_adapters/postgresql_adapter.rb:83:in `rescue in new_client'
Caused by PG::ConnectionBad: connection to server at "::1", port 9456 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

from /Users/jason/.rbenv/versions/2.7.5/lib/ruby/gems/2.7.0/gems/pg-1.3.1/lib/pg/connection.rb:637:in `async_connect_or_reset'

We found that there's a potential threading issue that occurs here, as we've been able to resolve the BadConnection error by adding a simple puts immediately after this line.

Is this an issue that you all have possibly seen before?

@larskanis
Copy link
Collaborator

Please update to the latest pg release! The ConnectionBad issue has probably been solved already.

@jsnwesson
Copy link
Author

Hi @larskanis, we just tested making a query using 1.3.5, and we're getting the same issue as above.

@larskanis
Copy link
Collaborator

That's no threading issue. The inserted put just returns nil and disables name resolution in Ruby therefore. Then the name resolution is done in libpq, which sometimes is different for localhost.

Your tunnel obviously accepts only IPv4 connections, but localhost is resolved to IPv6.

You can adjust the name resolution to IPv4 or IPv6 by adding something like this to your /etc/hosts :

127.0.0.1 localhost
::1 localhost

Or you specify 127.0.0.1 in your connection string.

@larskanis
Copy link
Collaborator

I just noticed, that pg-1.2.3 tries to connect to both IPv4 and IPv6 addresses, if "localhost" resolves to both (which is usually the case, if "localhost" isn't defined in /etc/hosts).

In contrast pg-1.3.5 tries to connect to only the first address that is resolved from the hostname, which is usually the IPv6 address. Since your tunnel is only accessible by IPv4, the connection fails.

So there is a regression in pg-1.3.x. I'll fix this.

@jsnwesson
Copy link
Author

Thanks for continuing to look into this! I'll let the team know to expect an update on this.

larskanis added a commit to larskanis/ruby-pg that referenced this issue Jun 8, 2022
larskanis added a commit to larskanis/ruby-pg that referenced this issue Jun 8, 2022
larskanis added a commit to larskanis/ruby-pg that referenced this issue Jun 8, 2022
larskanis added a commit to larskanis/ruby-pg that referenced this issue Oct 8, 2022
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it.

The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure.
But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration.
So we can not distinguish between an authentication error and other types of errors, other then by the error message.
But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue.

This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes:
1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452)
2. and connect_timeout is handled roughly like libpq. (fixes ged#450)

It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts.
Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
larskanis added a commit to larskanis/ruby-pg that referenced this issue Oct 8, 2022
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it.

The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure.
But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration.
So we can not distinguish between an authentication error and other types of errors, other then by the error message.
But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue.

This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes:
1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452)
2. and connect_timeout is handled roughly like libpq. (fixes ged#450)

It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts.
Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
larskanis added a commit to larskanis/ruby-pg that referenced this issue Oct 9, 2022
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it.

The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure.
But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration.
So we can not distinguish between an authentication error and other types of errors, other then by the error message.
But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue.

This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes:
1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452)
2. and connect_timeout is handled roughly like libpq. (fixes ged#450)

It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts.
Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
larskanis added a commit to larskanis/ruby-pg that referenced this issue Oct 9, 2022
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it.

The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure.
But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration.
So we can not distinguish between an authentication error and other types of errors, other then by the error message.
But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue.

This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes:
1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452)
2. and connect_timeout is handled roughly like libpq. (fixes ged#450)

It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts.
Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants