Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Document a process for upgrading a webhook endpoint API version #1177

Open
Dretch opened this issue Jan 27, 2021 · 3 comments
Assignees

Comments

@Dretch
Copy link

Dretch commented Jan 27, 2021

On the subject of API upgrades, the Stripe docs (https://stripe.com/docs/upgrades#how-can-i-upgrade-my-api) say:

Update your webhook code to handle both the old and new version of each object.

However, as far as I can see, stripe-java does not support this. A particular version of stripe-java can decode webhook events of a single API version only - events with different versions will fail to decode.

So therefore, I can imagine upgrading a webhook endpoint by doing something like this:

  1. Setup a new webhook in Stripe with the same endpoint as the original, but with the new API version. Create this webhook disabled.
  2. Change the webhook code so it can decode webhook events with the signing secret from the new webhook, as well as the old one (i.e. try each signing secret until one works).
  3. Change the webhook code to ignore events that fail to deserialize (i.e. return HTTP 200).
  4. Enable the new webhook that was created in step 1.
  5. Actually change the version of the stripe-java.
  6. Remove the original webhook (with the old API version), keep the new one (with the new API version).

After step 4 then events will be sent to both webhook endpoints (one with the old API version, one with the new API version), but only one of them will actually be processed (which one depends on if the new stripe-java version has been deployed yet).

There is a complication during deployments where both old and new versions of stripe-java are deployed at the same time, so the code will need a mechanism to ensure events are only processed once. This is suggested as a best-practice by Stripe anyway.

I think what I have described above will work. I'm not 100% sure though - and even if it did work then I wonder if there is a simpler/safer approach. Therefore, I am requesting that a correct process for upgrading API versions be documented.

PS. There is a similar question on Stack Overflow, with no definitive answer: https://stackoverflow.com/questions/62316273/how-to-upgrade-stripe-api-version-with-webhooks

Thanks!

@remi-stripe remi-stripe self-assigned this Jan 28, 2021
@remi-stripe
Copy link
Contributor

@Dretch Thanks for reaching out! We definitely should have more guidance on this and I'm flagging this internally to ensure this can be looked into.

In the meantime, let me show how I would approach the upgrade if I were you which just shows some tweaks to the plan you mentioned

  1. Setup a new webhook in Stripe with the same endpoint as the original, but with the new API version. Add a query param to that endpoint's URL for your code to know which version of the event it's processing
  2. Change the webhook code so it can decode webhook events with the correct signing secret and that it matches the version of stripe-java the code is currently running. Do this based on the query param for the endpoint (so that you don't need to deserialize the event itself to know which version it's for).
  3. Change the webhook code to ignore events that fail to deserialize. Either return an error (so that Stripe retries later) or return a 200 (but make sure to store the raw payload in case you need to rollback).
  4. Enable the new webhook that was created in step 1.
  5. Actually change the version of the stripe-java in your code

At this point, your production environment is running the newer stripe-java and getting events for the right version on the new endpoint and the wrong version on the old one. This is the moment to monitor event processing to make sure you didn't miss anything in your code. If you did, you can simply roll back to the old version of stripe-java, disable the new endpoint, and let Stripe retry all the events you might have missed, with the new API version.

You can repeat this operation until the rollout happens smoothly and no error happens. At that point, you can delete the old webhook endpoint as you won't need the event's json for the older API version.

As you can tell, it's really just a small variant of what you mentioned, mostly to focus on the "rollback period" which is quite important for statically typed libraries like stripe-java. But other than this, your intuition was definitely correct.

Let me know what you think!

@Dretch
Copy link
Author

Dretch commented Jan 28, 2021

Thanks for considering this @remi-stripe

  1. Change the webhook code so it can decode webhook events with the correct signing secret and that it matches the version of stripe-java the code is currently running. Do this based on the query param for the endpoint (so that you don't need to deserialize the event itself to know which version it's for).

That is a neat solution 👍

  1. Change the webhook code to ignore events that fail to deserialize. Either return an error (so that Stripe retries later) or return a 200 (but make sure to store the raw payload in case you need to rollback).

This seems a bit tricky. If an error is returned here then I guess that limits how long you have until you can no longer perform a rollback -- because Stripe will only keep retrying events for up to three days (and less in test mode). If 200 is returned then some other mechanism is needed for storing and replaying event payloads, which sounds rather non-trivial.

@remi-stripe
Copy link
Contributor

This seems a bit tricky. If an error is returned here then I guess that limits how long you have until you can no longer perform a rollback -- because Stripe will only keep retrying events for up to three days (and less in test mode). If 200 is returned then some other mechanism is needed for storing and replaying event payloads, which sounds rather non-trivial.

That's fair though I would expect that you rollback once you get errors, especially in the first few hours of the deploy. And as long as you store the raw JSON before parsing, recovery seems a bit easier because you "just" need to ingest that data after the bug is fixed (though I know it's not as easy as I make it sounds)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants