Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation failed: failed uploading filesystem: request failed: PUT, http2: timeout awaiting response headers #685

Open
itsavvy-ankur opened this issue Apr 24, 2023 · 3 comments

Comments

@itsavvy-ankur
Copy link

itsavvy-ankur commented Apr 24, 2023

Consistently seeing the below error when trying to plan a workspace in Terraform enterprise

Operation failed: failed uploading filesystem: request failed: PUT https://<<REDACTED>>/_archivist/v1/object/dmF1bHQ6djE6bVFjWmVYcnZwYmRGN0hSanRGZS8vbjhZd3lWNXpsM1dVOWtweUtBNkphSFVkbE5aWlNjZS9LNiswWXFkNFN5RDlUZmI2ZytET0xlcjAwODNhSitScS9wM1R4TlFEWEFzWUtDUFJqYVV5dVNEc2ROMlEvZWtKTUZKOXRjNHV0YnhSeUlkRnNnakVqKzhmSFBXRUtRV25GcElCdndtaUJwaVQ5SXV4NnhPMC9IeW80aE9iQTdyK2VNWVIxdlVVYU9qdEVNZkpqODZFb29GVWV0VzcrUlFCSmZlbFFkbTFYMTkvTWgrY2M2QndhVWdVN2tnNjhMOFJ5aWZsdzlIRzR6Rnd6OW9ZdEFFMWs2Q0NEWU5PMHgrLzZiNmZ6ZWpjU25TaFRoZnF5anpISDJDeXFUMG1jM0IzZm92cmVjaTE5bmZkalVRUmo4U1Q1MGNsSVQ5YWdjPQ giving up after 16 attempt(s): Put "https://<<REDACTED>>_archivist/v1/object/dmF1bHQ6djE6bVFjWmVYcnZwYmRGN0hSanRGZS8vbjhZd3lWNXpsM1dVOWtweUtBNkphSFVkbE5aWlNjZS9LNiswWXFkNFN5RDlUZmI2ZytET0xlcjAwODNhSitScS9wM1R4TlFEWEFzWUtDUFJqYVV5dVNEc2ROMlEvZWtKTUZKOXRjNHV0YnhSeUlkRnNnakVqKzhmSFBXRUtRV25GcElCdndtaUJwaVQ5SXV4NnhPMC9IeW80aE9iQTdyK2VNWVIxdlVVYU9qdEVNZkpqODZFb29GVWV0VzcrUlFCSmZlbFFkbTFYMTkvTWgrY2M2QndhVWdVN2tnNjhMOFJ5aWZsdzlIRzR6Rnd6OW9ZdEFFMWs2Q0NEWU5PMHgrLzZiNmZ6ZWpjU25TaFRoZnF5anpISDJDeXFUMG1jM0IzZm92cmVjaTE5bmZkalVRUmo4U1Q1MGNsSVQ5YWdjPQ": http2: timeout awaiting response headers

Investigating further this issue arises during creating a new configuration version, from the UI it would seem all elements of the plan are successful -

https://github.com/hashicorp/go-tfe/blob/main/configuration_version.go#L277-L284

func (s *configurationVersions) UploadTarGzip(ctx context.Context, uploadURL string, archive io.Reader) error {
	req, err := s.client.NewRequest("PUT", uploadURL, archive)
	if err != nil {
		return err
	}

	return req.Do(ctx, nil)
}

The above code uses the go-tfe client which has its default values defined here and I am wondering if this is causing the calls to timeout and if there is a way to increase them ?

go-tfe/tfe.go

Lines 343 to 351 in 8623569

client.http = &retryablehttp.Client{
Backoff: client.retryHTTPBackoff,
CheckRetry: client.retryHTTPCheck,
ErrorHandler: retryablehttp.PassthroughErrorHandler,
HTTPClient: config.HTTPClient,
RetryWaitMin: 100 * time.Millisecond,
RetryWaitMax: 400 * time.Millisecond,
RetryMax: 30,
}

@uturunku1
Copy link
Collaborator

Hi Ankur! I am part of the team that maintains this project.
It's possible the configuration version upload may be erroring due to size issues (size of the configuration and workspace) and the space you have available internally.
Can you check the bundle logs to see if there are any errors in the bundle regarding disk space? Perhaps the disk was getting full at the time of this process.
If this was to be the case, increasing the disk size as a first step will be helpful. The second thing I'd like to recommend is reducing the size of the workspace/configuration into smaller separated configurations.
Let me know if this recommendation work for you 🙏

@itsavvy-ankur
Copy link
Author

Hello - @uturunku1 , you are correct, on investigating this further, this is due to the size of the workspace, however there is enough disk capacity. We are looking to refactor the workspace as it spins up quite a few modules increasing its size and then timing out on the operation.
Is there a way to increase the timeout value ?

@uturunku1
Copy link
Collaborator

uturunku1 commented May 5, 2023

Thanks for the update, @itsavvy-ankur !

Our current response header timeout is 30 seconds, which is considered to be a generous one. So, for now, they wouldn't increase that timeout value.
I'm glad though to hear that your team will considering refactoring the workspace.

P.S. One of my colleagues had a interesting perspective on regards to the timeout error.

My colleague mentioned that the timeout is not applicable to Archivist, the object storage. This service usually allows for very large requests. So if there is not timeout imposed in between go-tfe and Archivist, he is wondering if it possible that in your TFE environment there is a proxy timeout in your network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants