Connection pool hangs if authentication fails #141

levi-nz · 2022-11-06T03:40:27Z

When using a connection pool with tokio_postgres, the connection hangs until a TimedOut error is returned if the password is incorrect. This behaviour is undesirable and should just result in authentication failure.

To re-produce, simply try connect to any database using an incorrect password and the connection will hang until it times out.

djc · 2022-11-07T11:06:59Z

I'd be happy to review a PR for this, but I don't think I'll be able to work on this myself in the near future.

couchand · 2023-02-27T22:15:57Z

I've spent a bit of time looking into this while debugging some TLS connection issues. It is decidedly not a simple fix, unfortunately. There are a few things that I've been doing to make life a little bit easier, however:

If your deployment allows it, set Builder::min_idle to require at least one open connection before considering the Pool ready. Consider testing the connection parameters on startup with a short-lived pool and very low timeout.
Be sure to register an ErrorSink with the method Builder::error_sink which will give you an opportunity to log errors that otherwise would fall off the table.
Also, I'll be taking a look at using the crate bb8_failsafe here for more proactive connection error management.

The touchpoint for the issues is at PoolInner::add_connection here:

bb8/bb8/src/inner.rs

Lines 199 to 225 in 5737736

    
           let conn = shared 
        
               .manager 
        
               .connect() 
        
               .and_then(|mut c| async { self.on_acquire_connection(&mut c).await.map(|_| c) }) 
        
               .await; 
        
           match conn { 
        
               Ok(conn) => { 
        
                   let conn = Conn::new(conn); 
        
                   shared 
        
                       .internals 
        
                       .lock() 
        
                       .put(conn, Some(approval), self.inner.clone()); 
        
                   return Ok(()); 
        
               } 
        
               Err(e) => { 
        
                   if Instant::now() - start > self.inner.statics.connection_timeout { 
        
                       let mut locked = shared.internals.lock(); 
        
                       locked.connect_failed(approval); 
        
                       return Err(e); 
        
                   } else { 
        
                       delay = max(Duration::from_millis(200), delay); 
        
                       delay = min(self.inner.statics.connection_timeout / 2, delay * 2); 
        
                       sleep(delay).await; 
        
                   } 
        
               } 
        
           }

This is the code that ultimately calls into the ManageConnection impl to make the connection, and (in the case of an error) keeps trying until it gets one successfully or the pool's connection timeout expires.

The right behavior in the general case is to delay returning any error until the timeout, but it would be great if there were a way to identify particular classes of errors that shouldn't wait for the timeout. It seems like that would require cooperation from the ManageConnection impl, which means it would require a new trait method (say something like fn error_is_fatal(&self, e: Self::Error) -> bool { return false; } which could then be used to exit early. Then we could override the default impl of that trait method where appropriate. Unfortunately for my use-case (bb8-postgres) the tokio-postgres crate does not expose error details in a way that could reasonably be used for this purpose. Long story short, doing this would be a lot of work.

As a side note @djc or anyone else looking more deeply, this call stack is Pool::get => PoolInner::get => PoolInner::make_pooled => PoolInner::spawn_replenishing_approvals raises my eyebrows. In particular, it seems like the waiter ought to be a Result channel so that a root cause can bubble back up. As it is, we lose the original error even in cases where we are actively waiting for the connection, which does not seem to be the intent of spawn_replenishing_approvals.

bb8/bb8/src/inner.rs

Lines 138 to 139 in 5737736

    
           let approvals = locked.push_waiter(tx, &self.inner.statics); 
        
           self.spawn_replenishing_approvals(approvals);

djc · 2023-03-03T09:40:54Z

As a side note @djc or anyone else looking more deeply, this call stack is Pool::get => PoolInner::get => PoolInner::make_pooled => PoolInner::spawn_replenishing_approvals raises my eyebrows. In particular, it seems like the waiter ought to be a Result channel so that a root cause can bubble back up. As it is, we lose the original error even in cases where we are actively waiting for the connection, which does not seem to be the intent of spawn_replenishing_approvals.

That sounds sensible. Want to propose a PR?

Unfortunately for my use-case (bb8-postgres) the tokio-postgres crate does not expose error details in a way that could reasonably be used for this purpose. Long story short, doing this would be a lot of work.

Might be worth an issue against tokio-postgres? I would definitely be open to adding an error_is_fatal() like hook to the ManageConnection trait.

couchand · 2023-03-03T15:08:26Z

That sounds sensible. Want to propose a PR?

Happy to work something up. Since the waiter is stored in internals::PoolInternals and the error pops up in inner::PoolInner::spawn_start_connections, I think the move is to replace this line that throws to the error_sink

bb8/bb8/src/inner.rs

Line 67 in 5737736

Err(e) => this.inner.statics.error_sink.sink(e),

with a call to a new method on SharedPool that looks roughly like:

impl<M: ManageConnection> PoolInternals<M> {
    fn forward_error(&self, err: M::Error) {
        let mut locked = self.internals.lock();
        while let Some(waiter) = locked.waiters.pop_front() {
            match waiter.send(Err(err)) {
                Ok(_) => return,
                Err(e) => err = e,
            }
        }
        self.statics.error_sink.sink(e);
    }
}

Might be worth an issue against tokio-postgres? I would definitely be open to adding an error_is_fatal() like hook to the ManageConnection trait.

It looks like it has been discussed a few times (sfackler/rust-postgres#571, sfackler/rust-postgres#583, sfackler/rust-postgres#790), though usually in a different context. There's even another project that is trying to do exactly the same thing: https://github.com/palfrey/wait-for-db/blob/c3fb9dbad94f06f91f83d6ef9e4af0b93ef64083/src/pg.rs#L34-L49

In at least one case a PR was merged that exposed a limited interface to the error kind (sfackler/rust-postgres#739) so it's reasonable to think a similar PR would be well received.

djc · 2023-03-13T14:51:59Z

Yeah, the forward_error() approach seems reasonable. Should probably verify all sink() callers (at least in PoolInner) to see if they should use this.

couchand · 2023-03-20T21:08:23Z

Hmm, it's not so easy, I guess. The above strategy works, but only if the user has configured retry_connection(false). If not, in practice we hit the "outer" timeout on the waiter channel here:

bb8/bb8/src/inner.rs

Line 142 in 03f95c9

match timeout(self.inner.statics.connection_timeout, rx).await {

before we exhaust the "inner" timeout on the retry loop here:

bb8/bb8/src/inner.rs

Line 216 in 03f95c9

|| Instant::now() - start > self.inner.statics.connection_timeout

After a quick review I don't immediately see a great way to get these to line up that doesn't feel like a hack (i.e. make the outer one "a little bit longer" or some such). The outer timeout can't really be removed, since we're relying on that in the face of pool contention. The inner timeout in the context of the retry loop looks locally correct, and it's hard to picture what would need to change really.

I'll go ahead and submit a PR to bubble up the error for the case of a pool configured with retry_connection(false), since that at least makes a little bit of progress here.

This ensures that an active waiter will be notified of a connection failure before we resort to sending it to the sink. Contributes to djc#141.

This ensures that an active waiter will be notified of a connection failure before we resort to sending it to the sink. Contributes to #141.

djc mentioned this issue Jan 5, 2023

fix #141 #147

Closed

djc mentioned this issue Feb 3, 2023

Error sink not called unless debugger is attached with breakpoints #151

Closed

couchand mentioned this issue Mar 3, 2023

It would be easier to deal with errors if the error kind is made pub sfackler/rust-postgres#583

Open

couchand added a commit to couchand/bb8 that referenced this issue Mar 20, 2023

Forward error to waiter before sinking

c9be82c

This ensures that an active waiter will be notified of a connection failure before we resort to sending it to the sink. Contributes to djc#141.

couchand mentioned this issue Mar 20, 2023

Forward error to waiter before sinking #153

Merged

djc pushed a commit that referenced this issue May 26, 2023

Forward error to waiter before sinking

935b887

This ensures that an active waiter will be notified of a connection failure before we resort to sending it to the sink. Contributes to #141.

svix-james mentioned this issue Jul 19, 2023

Better surfacing of underlying Redis connection issues svix/svix-webhooks#1012

Open

weiznich mentioned this issue Aug 6, 2023

Fail to get the connection with the features ["postgres", "bb8"] weiznich/diesel_async#99

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection pool hangs if authentication fails #141

Connection pool hangs if authentication fails #141

levi-nz commented Nov 6, 2022

djc commented Nov 7, 2022

couchand commented Feb 27, 2023

djc commented Mar 3, 2023 •

edited

couchand commented Mar 3, 2023 •

edited

djc commented Mar 13, 2023

couchand commented Mar 20, 2023

Connection pool hangs if authentication fails #141

Connection pool hangs if authentication fails #141

Comments

levi-nz commented Nov 6, 2022

djc commented Nov 7, 2022

couchand commented Feb 27, 2023

djc commented Mar 3, 2023 • edited

couchand commented Mar 3, 2023 • edited

djc commented Mar 13, 2023

couchand commented Mar 20, 2023

djc commented Mar 3, 2023 •

edited

couchand commented Mar 3, 2023 •

edited