Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK seems to create too many HTTP Clients causing EMFILE errors in macOS #4067

Open
3 tasks done
pcolazurdo opened this issue Aug 20, 2021 · 5 comments
Open
3 tasks done
Labels
bug This issue is a bug. p3 This is a minor priority issue

Comments

@pcolazurdo
Copy link

pcolazurdo commented Aug 20, 2021

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug
In MacOs (BigSur 11.5.1 and 11.5.2), the SDK is causing a sample to return many EMFILE (too many open files) errors. I can't trace where this is caused. I've tried changing the HTTP client with the code below, with no much luck.

I've tried the same code, which open many goroutines, in Linux and presented no problem at all. I understand MacOS has very low openfiles limits by default, and I understand this may be the cause, but the problem is so hard to trace that I wonder how many users are having silent issues because of this. The error is detected in other parts of the code because I try to open a few files in rapid succession (less than 50) and sometimes this "file opening" operation produces the error. The SDK itself is not exposing any errors at all.

I would like to know if it is possible to track when a new http client or connection is opened and why. Also, if there are good mechanisms to track/log these metrics.

Also, understanding some safe defaults on different OSes may be really useful.

httpClient, err := NewHTTPClientWithSettings(HTTPClientSettings{
		Connect:          5 * time.Second,
		ExpectContinue:   1 * time.Second,
		IdleConn:         90 * time.Second,
		ConnKeepAlive:    30 * time.Second,
		MaxAllIdleConns:  10, // To avoid issues with EMFILE errors when too many Idle connections are kept in MacOS
		MaxHostIdleConns: 2,
		ResponseHeader:   5 * time.Second,
		TLSHandshake:     5 * time.Second,
	}, obs)
	if err != nil {
		fmt.Println("Got an error creating custom HTTP client:")
		fmt.Println(err)
		panic("Got an error creating custom HTTP client:")
	}

	sess := session.Must(session.NewSessionWithOptions(session.Options{
		Config: aws.Config{
			HTTPClient: httpClient,
		},
		SharedConfigState: session.SharedConfigEnable,
	}))

Version of AWS SDK for Go?
AWS SDK Version: 1.40.16

Version of Go (go version)?
go version go1.16.6 darwin/amd64

To Reproduce (observed behavior)
Relatively large code sample here.

Expected behavior
First, safe defaults in different OS would be expected. Second, a mechanism to know and control how many http clients (I'm assuming this is the problem but I don't have a mechanism to confirm) exist at any given time. Also, more eventing/logging (probably it exists but I don't know how to activate) about when new HTTP clients are created.

Hope this helps,

@pcolazurdo pcolazurdo added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 20, 2021
@rittneje
Copy link
Contributor

rittneje commented Aug 30, 2021

@pcolazurdo It sounds like your issue is with the Go standard library's http.Client. You can use the httptrace package to hook into the lifecycle of the client pool's connections. https://pkg.go.dev/net/http/httptrace

@pcolazurdo
Copy link
Author

I've tried running the same code on Linux and lower the number of open files (to 256) to the same default than in Mac and the problem doesn't seem to appear. I did add some tracing but can't see anything obvious - the percentage of reused connections among both platforms seems to be the same.

@vudh1 vudh1 self-assigned this Apr 15, 2022
@vudh1
Copy link
Contributor

vudh1 commented Apr 15, 2022

Hi, can you confirm if this is still persisting with the latest version of SDK?

@vudh1 vudh1 added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Apr 15, 2022
@github-actions
Copy link

This issue has not received a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Apr 18, 2022
@pcolazurdo
Copy link
Author

I recently did upgrade to SDK v2 and did some other tests and I'm facing additional problems that seem to be related with this but can't be sure yet. I'm trying to create a clean reproduction but I can't have a solid repro example. My understanding at the moment is that somehow goroutines/CGO threads start piling up at some point and this create a cascade effect that includes Panic / running out of file handles.

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Apr 20, 2022
@vudh1 vudh1 removed their assignment Aug 25, 2022
@RanVaknin RanVaknin added the p3 This is a minor priority issue label Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

4 participants