You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My appliciation uses a fork of the kinsumer library, which uses this package to get data from kinesis. We have run into 'too many open files' errors in production on readfrom kinesis. (In case it's relevant, running on ECS with default setting of 1024 max network ports).
From digging into the code, I believe this happens when we process events from the stream very quickly, leading to many pulls from kinesis in a short period of time. It seems that this library doesn't close connections once a shard iterator is done pulling records from the shard, and when many of these calls are made before the connection expires, we run into this issue.
We are pulling data from multiple shards concurrently, but I believe that the issue is not caused by concurrent requests being made, but many sequential requests.
I believe this occurs because:
We create a shard iterator
We call GetRecords and process much of the data very quickly
We then call GetRecords again on the next shard iterator, ad infinitum.
GetRecords calls Request.Send() under the hood. This opens a connection on the network, but does not seem to close the connection. (Also note that the comment on this method says "Send will not close the request.Request's body." - I am yet to ascertain if this is relevant to my problem).
At this point, it would be very useful to me if someone more familiar with the codebase could tell me whether the above explanation of the scenario seems valid, or explain some other explanation I could investigate.
At this point I haven't found the time to produce a reproduction, but I can find the time to do so if I'm not barking up the wrong tree.
I would also be interested to hear opinions on how to solve or get around the issue in a sustainable way (beyond increasing the amount of connections available on the box, which we are doing), assuming that it is valid and reproducable.
Expected Behavior
I expect not to run into "too many open files" errors when making many subsequent GetRecords requests.
Current Behavior
Error:
level=error msg="Failed to pull next Kinesis record from Kinsumer client: error performing initial leader actions:
error loading shard IDs from kinesis: RequestError: send request failed\ncaused by: Post
\"https://kinesis.eu-central-1.amazonaws.com/\": dial tcp: lookup kinesis.eu-central-1.amazonaws.com on
10.100.0.2:53: dial udp 10.100.0.2:53: socket: too many open files" error="Failed to pull next Kinesis record
from Kinsumer client: error performing initial leader actions: error loading shard IDs from kinesis: RequestError:
send request failed\ncaused by: Post \"https://kinesis.eu-central-1.amazonaws.com/\": dial tcp: lookup
kinesis.eu-central-1.amazonaws.com on 10.100.0.2:53: dial udp 10.100.0.2:53: socket: too many open files
Reproduction Steps
I'm yet to find a full reproduction but I believe it can be done as follows:
Create a client
Create a shard iterator for each shard of a kinesis stream which is populated with ~100k records (estimated)
Call GetRecords on each, immediately ack, and call GetRecords again immediately - do this in a loop
Run the above on a box with a cap on available ports
Possible Solution
I think this can be solved if the connection we open when making the request in GetRecords() is closed, either when the last record is acked, or when we call a request on the next shard iterator.
Alternatively, if GetRecords() returned something which allows me to close the request manually, I could handle it in code.
Additional Information/Context
No response
SDK version used
v1.40.22 - relevant code seems equivlent in latest
Environment details (Version of Go (go version)? OS name and version, etc.)
1.17
The text was updated successfully, but these errors were encountered:
Describe the bug
My appliciation uses a fork of the kinsumer library, which uses this package to get data from kinesis. We have run into 'too many open files' errors in production on readfrom kinesis. (In case it's relevant, running on ECS with default setting of 1024 max network ports).
From digging into the code, I believe this happens when we process events from the stream very quickly, leading to many pulls from kinesis in a short period of time. It seems that this library doesn't close connections once a shard iterator is done pulling records from the shard, and when many of these calls are made before the connection expires, we run into this issue.
We are pulling data from multiple shards concurrently, but I believe that the issue is not caused by concurrent requests being made, but many sequential requests.
I believe this occurs because:
At this point, it would be very useful to me if someone more familiar with the codebase could tell me whether the above explanation of the scenario seems valid, or explain some other explanation I could investigate.
At this point I haven't found the time to produce a reproduction, but I can find the time to do so if I'm not barking up the wrong tree.
I would also be interested to hear opinions on how to solve or get around the issue in a sustainable way (beyond increasing the amount of connections available on the box, which we are doing), assuming that it is valid and reproducable.
Expected Behavior
I expect not to run into "too many open files" errors when making many subsequent GetRecords requests.
Current Behavior
Error:
Reproduction Steps
I'm yet to find a full reproduction but I believe it can be done as follows:
GetRecords
on each, immediately ack, and callGetRecords
again immediately - do this in a loopPossible Solution
I think this can be solved if the connection we open when making the request in
GetRecords()
is closed, either when the last record is acked, or when we call a request on the next shard iterator.Alternatively, if
GetRecords()
returned something which allows me to close the request manually, I could handle it in code.Additional Information/Context
No response
SDK version used
v1.40.22 - relevant code seems equivlent in latest
Environment details (Version of Go (
go version
)? OS name and version, etc.)1.17
The text was updated successfully, but these errors were encountered: