Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expirationChecker won't recreate connection object with Postgres client #3578

Open
dsbrgg opened this issue Dec 11, 2019 · 35 comments
Open

expirationChecker won't recreate connection object with Postgres client #3578

dsbrgg opened this issue Dec 11, 2019 · 35 comments

Comments

@dsbrgg
Copy link

dsbrgg commented Dec 11, 2019

Environment

Knex version: 0.20.3
Database + version: psql (PostgreSQL) 12.1
OS: macOS 10.14.6

Bug

The expirationChecker property on the connection object is not being called to recreate the connection object with new credentials for the database, even though the timeout was reached.

The first time that a request is being made, it works fine but after the timeout, it won't allow to make subsequent connections to the database returning a permission_denined error (as it should). Here's an example of the knex instance being made.

const database = knex({
  ...configuration,
  connection: async () => {
    const { timeout, ...credentials } = await vault.psqlCredentials('my-role');

    const connection = {
      ...configuration.connection,
      ...credentials,
      expirationChecker: () => timeout <= Date.now(),
      ssl: yn(get(configuration, 'connection.ssl', false))
    };

    return connection;
  }
});

It came to my attention that the psql config interface does not have an expirationChecker property in it but, I'm not sure if this has a direct impact with this situation since the function that would update the connection seems to be called on the first time.

@dsbrgg
Copy link
Author

dsbrgg commented Jan 15, 2020

So, after a couple of digging, I think I was able to find the issue and at least provide some kind of solution, even if a naive one.

The connectionConfigExpirationChecker property is not being used except for the first time. That is happening because it's only being called on the first create call on tarn.js.

From what I've investigated, tarn.js does not have a property or something like that to destroy resources based on expiration.

Basically, I've just added a setTimeout on the updatePoolConnectionSettingsFromProvider and acquiring the resource on the pool to properly close the connection. This also implies that the expirationChecker function should return a value representing the delay in milliseconds to be passed to setTimeout.

I've done some local tests here with the codebase I'm working with and I finally had the behaviour I was expecting without any side-effects from what I can see.

@kibertoad , @oranoran , @Ali-Dalal - maybe you could give some insight in this since you guys were involved in the last PR related to this functionality.

@elhigu
Copy link
Member

elhigu commented Jan 20, 2020

I actually don't like or understand why there even is expirationChecker attribute in connection. To me would be better if connection could be declared as a function which then may return config directly or it may return promise, which resolves config.

Then it would be up to client code to implement logic when it wants to get fresh config and when it wants to for example return the same already resolved promise for the old config.

Could someone explain me better how that async / function getting of connection parameters works nowadays or what am I missing here?

@dsbrgg
Copy link
Author

dsbrgg commented Jan 21, 2020

@elhigu I believe that this expirationChecker was introduced along with the PR that I've mentioned, that also introduced the async connection.

I can't tell for every case but at least in my case, this property expirationChecker would be very handy because I need to get new credentials for the database after a set amount of time and I believe this way would be easier since knex could just end the resource on its own.

I believe that the async connection is actually being used as it should but the issue is that, there's in fact a property expirationChecker which seems to not be used at all, besides the first time that tarn.js actually creates the resource.

@Ali-Dalal
Copy link

@dsbrgg to be honest, I haven't tried the expirationChecker because, in my cause, It is not needed. But setting the connection config as an async function works like a charm.

@oranoran can you put your resolution on this?

@oranoran
Copy link
Contributor

@dsbrgg is your expectation for the configuration to expire at the timeout, or for actual living connections that were created using it to expire?
The intention was to implement the former not the latter.
The implementation has nothing to do with tarn, it's part of the client code that creates new connections and isn't related to maintaining them once created.

@elhigu
Copy link
Member

elhigu commented Jan 22, 2020

I can't tell for every case but at least in my case, this property expirationChecker would be very handy because I need to get new credentials for the database after a set amount of time and I believe this way would be easier since knex could just end the resource on its own.

That would be as easy or even easier if knex would just always reload config if it is changed. Then there would be no need for that kind of timer. So to me that parameter still looks like a hack to me instead of feature. I remember that async knex configuration feature first implementation was not very versatile for many use cases either (IIRC it only made possible to get async config once and then there was no way to update it afterwards)...

@dsbrgg
Copy link
Author

dsbrgg commented Jan 22, 2020

@Ali-Dalal Yes, like I said, the async config works fine. My problem has been with the expirationChecker exclusively.

@oranoran I suppose both. Since the connection would have to expire after a timeout, the resource would have to be recreated or you would end up with an invalid knex instance - which is my case, after the timeout, I get invalid credentials errors.

Regardless, the expirationChecker is just being called on the resource creation. If this is what the property was actually intended to do, then I got it wrong. I thought it would be checking if the connection was still valid somehow.

@elhigu I'm sorry but I don't understand what you mean by "reloading the config if it's changed". The timer was just a naive approach to the matter as I was under the impression that it was supposed to verify if the connection was valid from time to time or before requests were made and then call the async connection function again.

@elhigu
Copy link
Member

elhigu commented Jan 22, 2020

I'm sorry but I don't understand what you mean by "reloading the config if it's changed".

I mean something like this:

let config = {};
setInterval(() => {
  config = generateNewConfig();
}, 5000); 

{
  client: 'pg',
  connection: () => {
     return config;
  }
}

That would cause knex to reload every 5 seconds, since returned configuration object is changing every 5 seconds (of course connection callback could also return a promise which would be asynchronously resolved).

I was under the impression that it was supposed to verify if the connection was valid from time to time or before requests were made and then call the async connection function again.

I'm pretty sure that it does not check if the connection details are valid, but only re-reads new config with that interval.

@dsbrgg
Copy link
Author

dsbrgg commented Jan 22, 2020

@elhigu Thanks for the example. I guess this could do it.

I'm pretty sure that it does not check if the connection details are valid, but only re-reads new config with that interval.

I think this is not actually happening. At least from what I've tested, even after the expiration was passed, connection was not being called again to re-read the config/update the config details.

That's what I mean when I say that the connection is valid or not, since the expiration, for me, means that at that point that the connection would no longer be valid.

It is called once when the connection is first being created and never again to see if the expirationChecker will return true. If that's the intended behaviour than my bad, I misunderstood the point of the property.

Anyway, I'll try something around that example you've showed.

@elhigu
Copy link
Member

elhigu commented Jan 24, 2020

Anyway, I'll try something around that example you've showed.

I don't think that my example will work at all, since it is an imaginary API how I would have wanted that configuration reloading feature to work.

@dsbrgg
Copy link
Author

dsbrgg commented Jan 24, 2020

Oh sorry, I thought you meant as if that could solve it.

Could this function be called upon the validation of a connection?

This way the connection could be destroyed and also the connection would be rebuilt at that point. This way it would prevent any type of timer.

I suppose the property connectionConfigExpirationChecker is set to true by default on initialisation, so maybe a variable that would contain the function for expirationChecker and will be checked and called upon validate, maybe something along these lines. I've tried some tests here and it worked as well.

@elhigu
Copy link
Member

elhigu commented Jan 24, 2020

Maybe... more people should read through those connection parts of the code base to make sure that it is a valid solution... I don't know when I would have time to concentrate on that.

@dsbrgg
Copy link
Author

dsbrgg commented Jan 27, 2020

I don't need this immediately as I came up with this issue for an upcoming feature on my project but it would be great if this could be implemented. Thanks for the responses as well @elhigu!

@sprusr
Copy link

sprusr commented Jun 30, 2021

Hey, digging this issue up again. I have pretty much the same use-case with rotating creds. Currently the only reliable way I've found to expire connections using stale creds is to .destroy() and .initialize() the connection pool.

I think the proposed solution three comments above is a very reasonable one, and shouldn't introduce any unexpected new logic. If expirationChecker returns true, the implication there is that the configuration which was loaded is no longer valid, and I'm not sure if there is a situation in which that would not mean connections using that config are no longer valid.

Let's turn this into a PR?

@elhigu
Copy link
Member

elhigu commented Jun 30, 2021

@sprusr I don't like that expiration checker way of implementing it. It is really limited and seems hacky. Just allowing that the configuration is a callback function returning configuration object or a promise resolving to configuration object should be enough for knex to know when the configuration is changed and requires reload (just simply by comparing identity of old configuration and the one returned from callback).

What is needed to make this to PR would be to describe clearly what pool will do when configuration is changed and there are still old connections with old configuration alive. For example ongoing transactions are like that. What would be wanted behavior in that case.

@sprusr
Copy link

sprusr commented Jun 30, 2021

@elhigu not sure I agree that the expirationChecker is limited or hacky, but certainly comparing what is returned/resolved from connection callback would also work nicely. Although I expect that currently in many uses of Knex, people are doing quite heavy lifting in that async connection function, expecting it to be called very rarely or even only once. In that respect this could be considered a breaking change of sorts.

Great point about transactions, that throws something interesting into the mix. I suppose expected behaviour would be to continue using the same connection for the transaction. Then when that connection returns to idle, it would be destroyed. There is some concept of being "idle" in tarn I believe, maybe that can help here?

Or maybe it's much more complicated to dictate how "expired" connections should behave.

@elhigu
Copy link
Member

elhigu commented Jun 30, 2021

Knex, people are doing quite heavy lifting in that async connection function, expecting it to be called very rarely or even only once. In that respect this could be considered a breaking change of sorts.

If there is too much heavy lifting done, then they should add cache to it.

In that respect this could be considered a breaking change of sorts.

I thought knex didn't allow passing function to connection object yet. If it already is like that then yeah, it will be a breaking change, but still easy to fix with any memoize helper.

@sprusr
Copy link

sprusr commented Jul 1, 2021

I thought knex didn't allow passing function to connection object yet

There's a section under here, 8th code example down: https://knexjs.org/#Installation-client

Just to give as much context as possible: that example is also again the use-case which I'm working with here, that the creds expire after some time. The trouble with the example is that open connections continue to be used, and the way that my credentials are expired are just that privileges for that user are revoked and connection is not explicitly dropped by the server (I believe this is quite normal behaviour). The responsibility is on the client to stop using the expired connections (preferably before the creds have actually expired) and start using connections with newer creds.

If there is too much heavy lifting done, then they should add cache to it

Yeah sure if that connection function is behaving in the way proposed here. The trouble is that the current behaviour is that this function is called only when a new connection pool is being created, or when expirationChecker has returned true when creating a new connection. It's even recommended in that example above to do things like asynchronous fetching of creds. So people are doing their expensive operations in there.

The reason I think the better choice here for solving this use-case is the expirationChecker is because it's already there, recommended in documentation, and implementations using it in its current form would not break if it was changed to be called more often (before connections are acquired from the pool, as well as before they are created). Avoiding a breaking change is definitely a nice bonus here.

What I'm interested in is why a call to expirationChecker wasn't added to the pool's validate function in #3497 in the first place. If this was a deliberate decision, or if there's something I'm missing as to why it's a bad idea. Or perhaps I misunderstand the scope of what expirationChecker is intended to be used for.

By the way, thanks for engaging with this after I dug it up after a long time. I wasn't expecting to get responses so soon!

@elhigu
Copy link
Member

elhigu commented Jul 2, 2021

The reason I think the better choice here for solving this use-case is the expirationChecker is because it's already there, recommended in documentation,

Right. Yeah now I get it and makes sense to use that ready made feature.

Though I'm a bit sad that the dynamic configuration thing was implemented that way...

What I'm interested in is why a call to expirationChecker wasn't added to the pool's validate function in #3497 in the first place. If this was a deliberate decision, or if there's something I'm missing as to why it's a bad idea. Or perhaps I misunderstand the scope of what expirationChecker is intended to be used for.

Probably it was fine for their use-case to be using old connections made with old credentials when the feature was written. And I 'm still not sure if it should be there always of is that also something opinionated functionality, that for some people it should be there and for others it is not necessary.

One should be able to give a custom validation function for tarn in pool's config.

@toanvc
Copy link

toanvc commented Aug 20, 2021

@elhigu Hello I'm getting this exact issue in my project and it's a blocker for me. The proposal solution seems to be exact what you are mentioning in your comment give a custom validation function for tarn in pool's config.

Proposal idea adds expirationChecker in validate function and this is passed to tarn config. It's working well for me. Can you explain more?

validate: (connection) => {
        if (isConnectionExpired()) {
          this.logger.warn(`Connection Expired`);
          return false;
        }

        if (connection.__knex__disposed) {
          this.logger.warn(`Connection Error: ${connection.__knex__disposed}`);
          return false;
        }

        return this.validateConnection(connection);
      },

@julrich1
Copy link

Right now we are blocked from using RDS with IAM authentication due to this bug. expirationChecker is in the Knex documentation but currently does nothing at all, how can we get a fix merged in?

@kibertoad
Copy link
Collaborator

@julrich1 Best solution would be implementing support for custom validation function in https://github.com/vincit/tarn.js/
Second best would be making knex.js implementation more flexible.

PRs for either approach would be most welcome.

@julrich1
Copy link

@julrich1 Best solution would be implementing support for custom validation function in https://github.com/vincit/tarn.js/
Second best would be making knex.js implementation more flexible.

PRs for either approach would be most welcome.

It looks like this proposal https://github.com/dsbrgg/knex/blob/connection-timeout/lib/client.js#L266-L309 does indeed pass the expirationChecker to the Tarn validate function, I'd be happy to work on an MR but it seems like @elhigu had some concerns with the approach.

@kibertoad
Copy link
Collaborator

@julrich1 Is there a way to implement it in a way that would be completely opt-in and does not affect people who don't need it?

From my understanding the main concern was this: And I 'm still not sure if it should be there always of is that also something opinionated functionality, that for some people it should be there and for others it is not necessary.

@julrich1
Copy link

The proposed change makes use of the existing expirationChecker field from the Knex configuration object (which currently does nothing) and only uses it if it is set, unless users are setting that expirationChecker property, there would be no behavior change for them.

@kibertoad
Copy link
Collaborator

@julrich1 Sounds like a good approach then.

@amarsiingh
Copy link

Resurrecting this thread again.
Based on the discussion above it looks like the solution provided in this solution would be ideal fix.
can we please get this reviewed and merged to the master, if its not already fixed in some other PR?
This is indeed a blocker for a project I'm working on.

@kibertoad
Copy link
Collaborator

@amarsiingh could you create a PR with that approach?

@deepmetalpay
Copy link

Did someone find a way around this? We are having same issue with IAM token authentication (it expires every 15 mins). I've tried. Was the PR ever created/merged?

@ploth
Copy link

ploth commented Jun 15, 2023

We are still on knex 2.0.0 and the following is working like a charm:

return knex({
    client: "postgres",
    connection: async () => {
      const { token, expiresAt } = await getToken();

      return {
        host: `/cloudsql/${process.env.CLOUD_SQL_CONNECTION_NAME}`,
        database: process.env.CLOUD_SQL_DATABASE,
        user: account.split(".gserviceaccount.com")[0],
        password: token,
        expirationChecker: () => {
          const remainingTime = expiresAt.diff(DateTime.now());
          // Google caches tokens until they have less than 5 minutes of
          // remaining validity. So we keep tokens exactly that long:
          // https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#applications
          return remainingTime <= Duration.fromObject({ minutes: 5 });
        },
      };
    },
    pool: {
      min: 5,
      max: 9, // our cloudSQL allows at most 19 connections
    },
  });

@deepmetalpay
Copy link

I have a similar implementation currently and its gives auth errors randomly. Basically whats happening is that the pool does not relieve the connections and keeps reusing the connection, however the token has expired for that connection and we get failures.

@ploth
Copy link

ploth commented Jun 15, 2023

I have a similar implementation currently and its gives auth errors randomly. Basically whats happening is that the pool does not relieve the connections and keeps reusing the connection, however the token has expired for that connection and we get failures.

Are you sure you are respecting the connection pool? Keep in mind that there are some "maintenance" connections held by google you are not able to use. E.g. 25 max connections => 6 reserved by google => 19 connections left for your service => with scale 2 your max should be 9

@deepmetalpay
Copy link

Are you sure you are respecting the connection pool? Keep in mind that there are some "maintenance" connections held by google you are not able to use. E.g. 25 max connections => 6 reserved by google => 19 connections left for your service => with scale 2 your max should be 9

Not really sure about that, i'll look into it. we're using AWS.

@tasdflkjweio
Copy link

tasdflkjweio commented Jul 12, 2023

I was trying to use this feature today to evaluate whether a JWT is still valid, but the functionality is very counterintuitive. I was expecting this to be evaluated when a connection obj is about to be consumed. Given the existing implementation...under times of heavy load, a pooled resource could expire but not be released which could lead to errors.

This could be greatly improved if you could explicitly expire resources based on time, leave it up to the library's consumer to cache the config object and always invoke the function to return the connection, or just have this expirationChecker function be evaluated right before the resource is consumed.

@tasdflkjweio
Copy link

tasdflkjweio commented Jul 13, 2023

@kibertoad same as others, this is a blocker for me too. i can do some hacky workarounds but i'd love for the lib to support oauth. what would it take to move this forward? i can make some time this week.

the path of least resistance seems to be to use the expirationChecker in the validate function, as was proposed before. another option for making the config dynamic would be to add a marker to the config fn to show that it should be re-computed.

// knexfile.js
return {
    ..., // other props
    connection: recompute(async () => {
        // make async calls
       return {...};
    }),
}

const shouldRecompute = Symbol();
const recompute = fn => {
  fn[shouldRecompute] = true;

  return fn;
}

you'd use that flag to decide whether to regenerate the config object, and the consumer could decide to cache the return object or not. not my fav, but it would be backward compatible.

if you're interested in the config solution, i may not have time to do all that asap but i could work on it. i'd definitely be up for getting the expiration working how everyone expects it to, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests