Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Select EventStream closes prematurely #3202

Closed
garry-9590 opened this issue Mar 12, 2020 · 3 comments
Closed

S3Select EventStream closes prematurely #3202

garry-9590 opened this issue Mar 12, 2020 · 3 comments
Labels
service-api This issue is due to a problem in a service API, not the SDK implementation.

Comments

@garry-9590
Copy link

Please fill out the sections below to help us address your issue.

Version of AWS SDK for Go?

1.25.17 & 1.29.22

Version of Go (go version)?

1.13.6

What issue did you see?

Please refer to the code below. The issue I am observing is that after processing some of the rows of csv, "ALL DONE" is getting printed while "End Event received" is not printed in the output logs. This means that the for loop over the stream of events was exited without encountering an EndEvent.
As a result, the PipeWriter gets closed without sending out an EOF error and the PipeReader loop is never exited and it keeps on throwing unexpectedEOF errors because the PipeWriter is closed.

I am trying to ingest 100,000 records of 200bytes each.

Is there a limitation on how much data EventStream can process?

Steps to reproduce

Same issue was reported on an earlier version (#2769)

If you have an runnable example, please include it.

	records := make(chan record)

	results, resultWriter := io.Pipe()
	go func(resultWriter *io.PipeWriter, selectObjectContentOutput *s3.SelectObjectContentOutput) {
		defer selectObjectContentOutput.EventStream.Close()
		defer resultWriter.Close()
		for event := range selectObjectContentOutput.EventStream.Events() {
			switch e := event.(type) {
			case *s3.RecordsEvent:
				_, err := resultWriter.Write(e.Payload)
				if err != nil {
					fmt.Printf("Error")
				}
			case *s3.EndEvent:
				fmt.Printf("End Event received")
			}
		}
		fmt.Printf("ALL DONE")
	}(resultWriter, selectObjectContentOutput)

	go func(records chan record, results *io.PipeReader) {
		defer close(records)
		resReader := json.NewDecoder(results)
		for {
			csvRecord := record{}
			err := resReader.Decode(&csvRecord)
			if err == io.EOF {
				break
			} else if err != nil {
				fmt.Printf("Error while decoding a row from S3. Error: %+v", err)
			}
			records <- csvRecord
		}
		if err := selectObjectContentOutput.EventStream.Err(); err != nil {
			fmt.Printf("Reading from S3 Select Event Stream Failed, %+v\n", err)
		}
	}(records, results)

	throttle := time.Tick(time.Duration(200) * time.Millisecond)
        for r := range records {
		<-throttle
                fmt.Printf("%v",r)
        }
}
@diehlaws diehlaws self-assigned this Mar 13, 2020
@diehlaws diehlaws added the service-api This issue is due to a problem in a service API, not the SDK implementation. label Mar 13, 2020
@diehlaws
Copy link
Contributor

Hi @garry-9590, thanks for reaching out to us. This seems like an issue with the service ending the stream prematurely rather than something specific to the AWS SDK for Go. I'll be reaching out to the S3 team internally about this, once I have more information I'll update the issue accordingly.

@garry-9590
Copy link
Author

Hi @diehlaws thank you for responding. Please do let me know when you have new information on this issue.

@diehlaws
Copy link
Contributor

Unfortunately I have been unable to reproduce the described behavior on my end. I generated a CSV file with 200-byte records of random data and have consistently received an End Event before my code completes, however I have not been successful in printing the data in the source CSV file using the provided code. Can you provide the SelectObjectContentInput you're using for this (with sensitive data removed)? Seeing what your record struct looks like along with a sample file without sensitive data would also help us reproduce and troubleshoot this more accurately.

The S3 team has requested the following information as they have also been unable to reproduce this behavior using a 1GB CSV file.

  • Information from request to S3:
    • requestId
    • Time of the request
    • Region of the request
  • Is it possible to provide a sample file without sensitive data?
  • The volume of traffic we can handle is not constant and depends on various factors :
    • The bucket being partitioned properly
    • The size of the file
    • number of requests
    • Type of the file

They also noted that exceeding S3's limits should result in 503 responses rather than an unexpected EOF.

@diehlaws diehlaws added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Mar 25, 2020
@diehlaws diehlaws added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Apr 24, 2020
@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Apr 25, 2020
@diehlaws diehlaws removed their assignment Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-api This issue is due to a problem in a service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

2 participants