Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TcpKeepAlive #360

Open
TreyLawrence opened this issue Apr 7, 2015 · 22 comments
Open

Add TcpKeepAlive #360

TreyLawrence opened this issue Apr 7, 2015 · 22 comments

Comments

@TreyLawrence
Copy link

I'm using this package with amazon's redshift, and am using it to do some long running queries (max around 15 minutes). The problem is that sometimes the connection is lost during these queries and the client is never notified, so gets blocked waiting for a response. It'd be nice if there were support for TcpKeepAlive to keep these connections from timing out.

@johto
Copy link
Contributor

johto commented Apr 7, 2015

If you were using an actual postgres, you could configure keepalive from server side.

That said, client-side keepalives don't sound like a bad idea.

@cbandy
Copy link
Contributor

cbandy commented Apr 7, 2015

http://felixge.de/2014/08/26/tcp-keepalive-with-golang.html

Looking at the options available to libpq and the above,

  • we can do keepalives_idle and keepalives_interval, but not separately;
  • we cannot do keepalives_count.

@johto
Copy link
Contributor

johto commented Apr 8, 2015

Looking at the options available to libpq and the above,

we can do keepalives_idle and keepalives_interval, but not separately;
we cannot do keepalives_count.

Yeah, I think you're right. I don't see any nice way of accessing the underlying file descriptor of a TCPConn, which is a great shame. I wonder if this is a case of nanny city or whether we could do something about this for Go 1.5.

@TreyLawrence
Copy link
Author

Is there a reason the net.Conn interface is used instead of the net.TCPConn type? It looks like the connection is always hardcoded to tcp anyways:
https://github.com/lib/pq/blob/master/conn.go#L259

If you guys are cool with switching to a net.TCPConn type, we could expose these:
https://golang.org/pkg/net/#TCPConn.SetKeepAlive
https://golang.org/pkg/net/#TCPConn.SetKeepAlivePeriod

@johto
Copy link
Contributor

johto commented Apr 25, 2015

It looks like the connection is always hardcoded to tcp anyways:
https://github.com/lib/pq/blob/master/conn.go#L259

Huh? What about this path?

@TreyLawrence
Copy link
Author

Yeah, that's a derp. Sorry. What about getting rid of the Dialer interface and defaultDialer, and using net.Dialer instead?

That way the dial function can instantiate a net.Dialer and set the timeout, deadline, and keep alive on it, and then call Dial(ntw, addr) on that.

@johto
Copy link
Contributor

johto commented Apr 27, 2015

Yeah, that's a derp. Sorry. What about getting rid of the Dialer interface and defaultDialer, and using net.Dialer instead?

That would break applications for no real reason.

That way the dial function can instantiate a net.Dialer and set the timeout, deadline, and keep alive on it, and then call Dial(ntw, addr) on that.

I don't see why you couldn't just special-case the TCP path and set the keepalive parameters there. If people are using custom dialers, they should set said parameters themselves.

@johto
Copy link
Contributor

johto commented Apr 27, 2015

I don't see why you couldn't just special-case the TCP path and set the keepalive parameters there. If people are using custom dialers, they should set said parameters themselves.

Or we could do it for custom dialers as well if a type assertion to a *net.TCPConn holds.

@TreyLawrence
Copy link
Author

That would break applications for no real reason.

I don't think I understand how you can use custom dialers. I see that drv implements the driver.Driver interface. But I'm not sure how you can use a dialer other than defaultDialer, since it's hardcoded here.

But I think you're right, a special-case for TCP will work best.

@johto
Copy link
Contributor

johto commented Apr 29, 2015

I don't think I understand how you can use custom dialers.

By registering your own driver and calling pq.DialOpen instead of implementing your own.

@tg
Copy link

tg commented Apr 30, 2015

+1 on this enhancement. When DB failover is occurring on Amazon RDS service, I'm seeing all my DB connection hanging indefinitely. I'll go with johto advice and implement my own driver, wrapping pq.

@johto
Copy link
Contributor

johto commented Apr 30, 2015

When DB failover is occurring on Amazon RDS service, I'm seeing all my DB connection hanging indefinitely.

I'm not sure this will actually help to be honest. IIRC once there's un-ACKed data in the pipe (i.e. you're actually doing a query and not just sitting idle), the keepalive intervals are ignored and much higher timeouts kick in. But I might as well be wrong.

@tg
Copy link

tg commented Apr 30, 2015

Thanks for pointing this out, will need to experiment for sure, good thing this is reasonably easy to reproduce.

Other solution I'm thinking of would be to simply maintain max idle period for every connection by wrapping net.Conn and applying Set(Read)Deadline on every read/send. I've been doing something similar for http.Server already, so this should be easy to test. Let's fork!

BTW I assume there's no heartbeating protocol in postgres, so idle connections effectively stay silent, is this correct?

@johto
Copy link
Contributor

johto commented Jun 5, 2015

BTW I assume there's no heartbeating protocol in postgres, so idle connections effectively stay silent, is this correct?

You probably figured this out already, but yes, idle connections are idle.

@vincepri
Copy link

Any update for this?

@himanshugpt
Copy link

Did anyone try cbandy's solution?

@LasTshaMAN
Copy link

LasTshaMAN commented Sep 28, 2019

I believe for most users lib/pq the absence of TcpKeepAlive parameter shouldn't be an issue if they are using a recent Go version, as since golang/go@5bd7e9c
if supported by the underlying operating system TcpKeepAlive will be set to 15 seconds on TCP socket (before that commit by default Go didn't set TcpKeepAlive leaving this as user's responsibility).

Thus in the future we shouldn't see problems similar to the one originally discussed in this issue.

@kingcr7
Copy link

kingcr7 commented Dec 12, 2019

Thanks for pointing this out, will need to experiment for sure, good thing this is reasonably easy to reproduce.

Other solution I'm thinking of would be to simply maintain max idle period for every connection by wrapping net.Conn and applying Set(Read)Deadline on every read/send. I've been doing something similar for http.Server already, so this should be easy to test. Let's fork!

BTW I assume there's no heartbeating protocol in postgres, so idle connections effectively stay silent, is this correct?

+1 on this enhancement. When DB failover is occurring on Amazon RDS service, I'm seeing all my DB connection hanging indefinitely. I'll go with johto advice and implement my own driver, wrapping pq.

Did you implement the driver and pg wrap? i want to learn how you implement and use it ?thank you very much.

@tg
Copy link

tg commented Dec 12, 2019

@kingcr7 since golang 1.6 my problem has been solved by setting max connection lifetime via sql package (https://golang.org/pkg/database/sql/#DB.SetConnMaxLifetime). Works like a charm, haven't had any issues with zombie connection for a few years now.

@alok87
Copy link

alok87 commented Sep 30, 2020

Why cannot the keep alive change like this Clever#1 be merged in this library?

alok87 added a commit to practo/tipoca-stream that referenced this issue Sep 30, 2020
lib/pq lacks support for keep alive lib/pq#360
Going with Clever/pq forked as practo/pq as Clever/pq lacks go module support
@marselester
Copy link

It would be great if you could try this #999 TCP keepalive solution on real workloads. Here is an example https://github.com/marselester/pg-keepalive.

@LasTshaMAN
Copy link

I think it is also worth pointing out that TCP keepalive alone won't help with the original issue, because:

A socket can only be in the keepalive state if it is idle. If there is outstanding data, the socket will be in the on state as it retransmits the data over and over because of the network failure.

It seems a common misconception that a connection with keep-alives enabled is going to return an error within TCP_KEEPIDLE + TCP_KEEPINTV * TCP_KEEPCNT seconds from a network failure. This is often not the case.

see currently open Golang issue for details - golang/go#31490

alok87 added a commit to practo/tipoca-stream that referenced this issue Dec 15, 2020
lib/pq lacks support for keep alive lib/pq#360
Going with Clever/pq forked as practo/pq as Clever/pq lacks go module support
alok87 added a commit to practo/tipoca-stream that referenced this issue Dec 15, 2020
lib/pq lacks support for keep alive lib/pq#360
Going with Clever/pq forked as practo/pq as Clever/pq lacks go module support
alok87 added a commit to practo/tipoca-stream that referenced this issue Dec 15, 2020
lib/pq lacks support for keep alive lib/pq#360
Going with Clever/pq forked as practo/pq as Clever/pq lacks go module support
alok87 added a commit to practo/tipoca-stream that referenced this issue Jun 5, 2021
lib/pq lacks support for keep alive lib/pq#360
Going with Clever/pq forked as practo/pq as Clever/pq lacks go module support
alok87 added a commit to practo/tipoca-stream that referenced this issue Jun 7, 2021
lib/pq lacks support for keep alive lib/pq#360
Going with Clever/pq forked as practo/pq as Clever/pq lacks go module support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants