Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Error: write EPIPE when running stripe client in AWS Lambda #1040

Closed
hisham opened this issue Oct 13, 2020 · 14 comments · Fixed by #1336 or #1803
Closed

Intermittent Error: write EPIPE when running stripe client in AWS Lambda #1040

hisham opened this issue Oct 13, 2020 · 14 comments · Fixed by #1336 or #1803
Assignees
Labels

Comments

@hisham
Copy link

hisham commented Oct 13, 2020

We're using the stripe node client 8.71.0 on an AWS Lambda running node 12.x. A stripe customers.list call is called first thing when the lambda executes. 33% of the time - we get this error on that call. It consistently happens so does not seem to be transient.

I did read #650, and setting maxNetworkRetries in stripe to 2 seems to resolve the issue. However it seems that just masks the issue.

Is this a stripe issue or AWS Lambda issue? Probably lambda, I submitted a request with AWS. But putting this here in case others run into it.

2020-10-13T12:02:58.032Z c184006d-fe96-490a-9bfe-696b8271769a ERROR StripeConnectionError: An error occurred with our connection to Stripe.
at /var/task/node_modules/stripe/lib/StripeResource.js:234:9
at ClientRequest. (/var/task/node_modules/stripe/lib/StripeResource.js:489:67)
at ClientRequest.emit (events.js:315:20)
at ClientRequest.EventEmitter.emit (domain.js:483:12)
at TLSSocket.socketErrorListener (_http_client.js:426:9)
at TLSSocket.emit (events.js:315:20)
at TLSSocket.EventEmitter.emit (domain.js:483:12)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'StripeConnectionError',
raw: {
message: 'An error occurred with our connection to Stripe.',
detail: Error: write EPIPE
at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:92:16)
at writevGeneric (internal/stream_base_commons.js:132:26)
at TLSSocket.Socket._writeGeneric (net.js:784:11)
at TLSSocket.Socket._writev (net.js:793:8)
at doWrite (_stream_writable.js:401:12)
at clearBuffer (_stream_writable.js:519:5)
at TLSSocket.Writable.uncork (_stream_writable.js:338:7)
at ClientRequest.end (_http_outgoing.js:774:17)
at ClientRequest. (/var/task/node_modules/stripe/lib/StripeResource.js:506:15)
at Object.onceWrapper (events.js:422:26) {
errno: 'EPIPE',
code: 'EPIPE',
syscall: 'write'
}
},
rawType: undefined,
code: undefined,
doc_url: undefined,
param: undefined,
detail: Error: write EPIPE
at WriteWrap.onWriteComplete [as oncomplete] (internal/stream_base_commons.js:92:16)
at writevGeneric (internal/stream_base_commons.js:132:26)
at TLSSocket.Socket._writeGeneric (net.js:784:11)
at TLSSocket.Socket._writev (net.js:793:8)
at doWrite (_stream_writable.js:401:12)
at clearBuffer (_stream_writable.js:519:5)
at TLSSocket.Writable.uncork (_stream_writable.js:338:7)
at ClientRequest.end (_http_outgoing.js:774:17)
at ClientRequest. (/var/task/node_modules/stripe/lib/StripeResource.js:506:15)
at Object.onceWrapper (events.js:422:26) {
errno: 'EPIPE',
code: 'EPIPE',
syscall: 'write'
},
headers: undefined,
requestId: undefined,
statusCode: undefined,
charge: undefined,
decline_code: undefined,
payment_intent: undefined,
payment_method: undefined,
setup_intent: undefined,
source: undefined
}

@paulasjes-stripe paulasjes-stripe self-assigned this Oct 14, 2020
@paulasjes-stripe
Copy link
Contributor

paulasjes-stripe commented Oct 14, 2020

We've seen this before with AWS Lambda and believe it's an issue/configuration setting on their end. Using maxNetworkRetries seems to do the trick in most cases, but as you correctly stated it's more masking the problem than solving it.

When you hear back from AWS would you mind updating this issue with your findings?

@hisham
Copy link
Author

hisham commented Oct 14, 2020

Yea I have aws premium subscription should have a response soon.

I did find similar issues that people reported here with other libs:

So my latest theory is it's something related to keep-alive and sockets expiring, but at this point I added the retry and waiting for AWS to respond back to me.

@hisham
Copy link
Author

hisham commented Oct 14, 2020

Hi @paulasjes-stripe - here's the response we got from AWS:

Starting with the error, "EPIPE" error [0] is generally caused when data is piped into closed streams [1]. In the case of the NodeJS Lambda function, the error might be caused when the NodeJS event loop didn't clean-up closed TCP connections from the HTTP connection pool and then the NodeJS runtime attempted to use the closed TCP connection.

To understand the error better, below is what happens behind the scenes:

  • AWS Lambda function runs in an isolated container and usually each Invoke starts a new Lambda function execution in a new container.
  • However, if delay between two requests is very small, then the container used by the previous Invoke might be reused to cater to the later request as well. This is known as container reuse [2].
  • While finishing execution, Lambda does not consider the state of active processes in background other than handler function. Thus, when the execution is finished, the active processes turn into frozen state.
  • When the next request is processed by the container, the previously frozen asynchronous processes are started again.
  • If any of the frozen processes has dependency on the piping/streaming, then that process fails to continue execution as it does not find the pipeline/connection/stream it used in previous request.

To avoid these errors the following is suggested:

  1. Revisit the function code and ensure that the processes (dependent on connection/stream) are finished before lambda completes execution.
  2. Use the retry which will create new connection/stream for new request.

I hope the above information gives an idea on EPIPE errors and why adding retries may help in resolving the EPIPE errors.

However, If there are any further queries/concerns please let me know and I will be happy to assist.

References:
[0] https://nodejs.org/api/errors.html
[1] EPIPE error - nodejs/node#947
[2]https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/

So I'll just use Stripe's retry logic for now, as I don't seem to have control over stripes background processes. Is it the keep alive connection that is causing this issue? Not sure.

Our lambda is very simple, it basically just returns the results from this line:

await this.stripeClient.customers.list({ email })

It's a 2048mb lambda running nodejs 12. It is called via a GraphQL function transfomer (https://docs.amplify.aws/cli/function), but I don't think those details matter much.

Interestingly, I have other lambdas that also call the above rest API, but have other network calls and involved logic, and I've never ran into the EPIPE issue with them before.

@paulasjes-stripe
Copy link
Contributor

Thanks @hisham! We're going to look into this to see if there's anything that can be done from our end, but it looks like maxNetworkRetries are a suitable workaround for now.

@hisham
Copy link
Author

hisham commented Oct 16, 2020

Great. Yes maxNetworkRetries does the job. AWS seems to agree with me that calling destroy method on the httpagent before the lambda exists will probably also resolve this issue:

It is mentioned in AWS Lambda Best Practices [1][2] to use a keep-alive directive to maintain persistent connections. Quoting from Documentation
Lambda purges idle connections over time. Attempting to reuse an idle connection when invoking a function will result in a connection error. To maintain your persistent connection, use the keep-alive directive associated with your runtime.

However, In certain situations depending upon the time difference between 2 lambda invocations there might be chances of getting an idle connection present there and causing error.

Therefore, It sounds right to use agent.destroy() before exiting lambda to destroy all connections. But It needs to be made sure that the code to close/destroy all connections is executed before exiting lambda. Then, This would ensure that the socket connections are not hanging in there open.

As a workaround, Retries as you mentioned and have found to be working fine.

I hope this information helps. However, If there are any further queries please let me know.

[1] https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
[2] https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-reusing-connections.html

@huntedman
Copy link

@hisham Do you happen to know if I can wrap my stripe method calls inside trycatch, if I want to use maxNetworkRetries?
I'm also using aws lambdas, and I'm worried that it will prematurely exit in that case...

@hisham
Copy link
Author

hisham commented Nov 8, 2020

@huntedman we are using maxNetworkRetries and are not wrapping calls around try catch. Stripe seems to handle this stuff internally.

@suz-stripe
Copy link

Hi @hisham sorry for the radio silence about this recently but I'm checking in with a quick update. The response from AWS was very helpful to us (thank you!) and we're actively investigating this issue to provide a better fix than our suggested workaround. When we know more we'll definitely update you here again via this open issue. Thank you for your patience!

@richardm-stripe
Copy link
Contributor

richardm-stripe commented Dec 19, 2020

I've spent some time experimenting with AWS lambda, and have a better understanding of these errors. They are happening due to the interaction between

  1. how Lambda freezes/unfreezes processes
  2. and how stripe-node (by default) uses a single http Agent with keep-alive enabled

In case you're not familiar, keep-alive is a way for http clients like stripe-node to be more efficient when your application is making multiple requests to Stripe. Rather than making a new connection for each request, which has a performance cost, it keeps the connection to the server open after a request is finished, so that it can be reused on the next request.
In order for keep-alive to work, the open connection must ping the server every so often to let the server know that it is still active. If it doesn't, the server will assume the connection isn't active anymore and close the connection to make room for others.

The problem arises when Lambda freezes your Node process. While the process is frozen, the TCP connections can't ping the server to remain active, and the server closes them. When Lambda unfreezes your process, Node isn't aware that the connections have been closed, and it attempts to re-use them. As soon as it does, it gets EPIPE or ECONNRESET.

One option for eliminating these errors would be to disable keep-alive when you initialize stripe-node.

const https = require('https')
const stripe = require('stripe')('sk_live_xyz', {httpAgent: new https.Agent({keepAlive: false})})

This does mean sacrificing the benefits of keep-alive, but I expect that's an acceptable trade-off especially for low-traffic lambdas.

Another possibility would be initializing a new Stripe client with its own keep-alive-enabled agent inside the Lambda handler. This is roughly equivalent to Amazon's suggestion of calling .destroy on the http agent before exiting, but this isn't ideal either because it only allows you to re-use connections within each individual Lambda invocation, and not from one Lambda invocation to the next.

From my perspective, handling these errors by retrying is likely the proper approach, and shouldn't necessarily be viewed as a workaround, or masking an underlying issue, because it is expected/unavoidable that these broken connections will come to exist, and there doesn't seem to be an obvious way of asking Node "how long has it been since the last keep-alive probe on this connection" besides writing to the connection and triggering the error.

At the same time, I think we should look into the possibility of making stripe-node handle errors like this by default/more transparently, so that users don't have to configure the retries themselves. That seems to be what Amazon started doing for errors like this for their own SDKs about a month ago (thank you @hisham for linking to that issue, by the way).

Anyway I hope this clarifies things and we'll keep you posted.

@theoBLT
Copy link

theoBLT commented Feb 9, 2021

Thank you for opening this thread! I had the same issue on a very low traffic site (side project). I used Stripe's Node library inside Netlify functions, and got 502 errors with error message write EPIPE in the Netlify function logs. .

I moved forward with the fix you recommended @richardm-stripe, but the syntax didn't work. The below worked though:

const stripe = require('stripe')('secret_key_xyz', {
  httpAgent: new https.Agent({keepAlive: false})
});

theoBLT added a commit to theoBLT/indonesian that referenced this issue Feb 9, 2021
@richardm-stripe
Copy link
Contributor

Thanks @theoBLT, I've corrected the syntax in the original comment.

bpinto added a commit to bpinto/stripe-node that referenced this issue Jan 21, 2022
Requests that fail with closed connection errors (ECONNRESET, EPIPE) are
automatically retried.

- `ECONNRESET` (Connection reset by peer): A connection was forcibly
  closed by a peer.closed by a peer. This normally results from a loss
  of the connection on the remote socket due to a timeout or reboot.
  Commonly encountered via the http and net modules.

- `EPIPE` (Broken pipe): A write on a pipe, socket, or FIFO for which
  there is no process to read the data. Commonly encountered at the net
  and http layers, indicative that the remote side of the stream being
  written to has been closed.

Fixes: stripe#1040
bpinto added a commit to bpinto/stripe-node that referenced this issue Jan 28, 2022
Requests that fail with closed connection errors (ECONNRESET, EPIPE) are
automatically retried.

- `ECONNRESET` (Connection reset by peer): A connection was forcibly
  closed by a peer.closed by a peer. This normally results from a loss
  of the connection on the remote socket due to a timeout or reboot.
  Commonly encountered via the http and net modules.

- `EPIPE` (Broken pipe): A write on a pipe, socket, or FIFO for which
  there is no process to read the data. Commonly encountered at the net
  and http layers, indicative that the remote side of the stream being
  written to has been closed.

Fixes: stripe#1040
bpinto added a commit to bpinto/stripe-node that referenced this issue Jan 28, 2022
Requests that fail with closed connection errors (ECONNRESET, EPIPE) are
automatically retried.

- `ECONNRESET` (Connection reset by peer): A connection was forcibly
  closed by a peer.closed by a peer. This normally results from a loss
  of the connection on the remote socket due to a timeout or reboot.
  Commonly encountered via the http and net modules.

- `EPIPE` (Broken pipe): A write on a pipe, socket, or FIFO for which
  there is no process to read the data. Commonly encountered at the net
  and http layers, indicative that the remote side of the stream being
  written to has been closed.

Fixes: stripe#1040
richardm-stripe pushed a commit that referenced this issue May 9, 2022
* feat(http-client): Retry requests that failed with closed connection

Requests that fail with closed connection errors (ECONNRESET, EPIPE) are
automatically retried.

- `ECONNRESET` (Connection reset by peer): A connection was forcibly
  closed by a peer.closed by a peer. This normally results from a loss
  of the connection on the remote socket due to a timeout or reboot.
  Commonly encountered via the http and net modules.

- `EPIPE` (Broken pipe): A write on a pipe, socket, or FIFO for which
  there is no process to read the data. Commonly encountered at the net
  and http layers, indicative that the remote side of the stream being
  written to has been closed.

Fixes: #1040
@richardm-stripe
Copy link
Contributor

Oof, #1336 claimed to fix this, so it auto-closed, but I disagree that it's entirely fixed until retries are enabled by default.

pakrym-stripe added a commit that referenced this issue Jun 8, 2022
* API Updates (#1413)

* Bump version to 8.221.0

* API Updates (#1414)

* Bump version to 8.222.0

* API Updates (#1415)

* feat(http-client): retry closed connection errors (#1336)

* feat(http-client): Retry requests that failed with closed connection

Requests that fail with closed connection errors (ECONNRESET, EPIPE) are
automatically retried.

- `ECONNRESET` (Connection reset by peer): A connection was forcibly
  closed by a peer.closed by a peer. This normally results from a loss
  of the connection on the remote socket due to a timeout or reboot.
  Commonly encountered via the http and net modules.

- `EPIPE` (Broken pipe): A write on a pipe, socket, or FIFO for which
  there is no process to read the data. Commonly encountered at the net
  and http layers, indicative that the remote side of the stream being
  written to has been closed.

Fixes: #1040

* Remove deprecated orders-related events (#1417)

* Bump version to 9.0.0

* API Updates (#1420)

* Codegen for openapi 7789931

* Bump version to 9.1.0

* API Updates (#1422)

* Codegen for openapi 056745c
Co-authored-by: Richard Marmorstein <richardm@stripe.com>
Co-authored-by: Dominic Charley-Roy <dcr@stripe.com>

* Bump version to 9.2.0

* Codegen for openapi v146 (#1430)

* Bump version to 9.3.0

* Codegen for openapi v147 (#1431)

* Bump version to 9.4.0

* docs: Update HttpClient documentation to remove experimental status. (#1432)

* Codegen for openapi v149 (#1434)

* Bump version to 9.5.0

* API Updates (#1439)

* Bump version to 9.6.0

* Update README.md (#1440)

* Codegen for openapi v152 (#1441)

* Add test for cash balance methods. (#1438)

* Bump version to 9.7.0

Co-authored-by: Dominic Charley-Roy <78050200+dcr-stripe@users.noreply.github.com>
Co-authored-by: Dominic Charley-Roy <dcr@stripe.com>
Co-authored-by: Richard Marmorstein <52928443+richardm-stripe@users.noreply.github.com>
Co-authored-by: Bruno Pinto <brunoferreirapinto@gmail.com>
Co-authored-by: Richard Marmorstein <richardm@stripe.com>
Co-authored-by: Kamil Pajdzik <99290280+kamil-stripe@users.noreply.github.com>
@FeliceGeracitano
Copy link

FeliceGeracitano commented Jul 24, 2022

I also got this issue in v8, I upgraded to v9 and all looks good now

automatically retry it is place now for CONNECTION_CLOSED_ERROR_CODES --> 47776ef

@anniel-stripe
Copy link
Contributor

Hello! maxNetworkRetries has been set to 1 by default with the release of stripe-node v13 today (enabled by this change). I'll be closing this issue, as the default behavior in v13 should prevent this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants