Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream decompression instead of buffering #2018

Merged
merged 4 commits into from Dec 3, 2021
Merged

Conversation

davidmdm
Copy link
Contributor

@davidmdm davidmdm commented Nov 4, 2021

The current implementation of decompress will decompress the entire request body into a bytes.Buffer,
and sets the request.Body to be that buffer. For very large payloads this is creates latency and high memory issues.

For example, should the user not which to process the request because it is invalid in some way, bad header, invalid query parameters, etc... The current middleware will still read in the whole body before passing giving control to the user's handler.

This fix instead wraps the gzip reader into a custom closer that the user will read from directly. Once the user or the native go http library closes the request, everything is cleaned via the custom closeFunc.

If the user for any reason chooses not to read from the body, the middleware will not have wasted time reading the request body in the first place, and should the user decide to write the stream of data to some other source no content will have been buffered in memory.

@aldas
Copy link
Contributor

aldas commented Nov 5, 2021

This needs some serious edge cases tests.

Some concerns - if some middleware, somewhere after this middleware, will also switch the request body this Close will not be called and that pooled object is never put pack to pool and original body will not be closed. We could say it is replacer responsibility to call close. just thinking out loud.

@davidmdm
Copy link
Contributor Author

davidmdm commented Nov 5, 2021

@aldas
If some future middleware after this middleware were to replace the body and forgot to call close, it would be the same issue as if it were the the first middleware and resources would be leaked.

It is the any replacer's responsibility to call Close on the underlying Body.

The current strategy that consumes the entire req.Body before calling the next middleware introduces serious latency.
Suppose a 50mb payload was sent. The entire body would be read from the connection introducing high memory, and causing the user's handle to wait to be called. If the user then decided for some reason that this request was bad, we would have waited for nothing. Even if the request was good, now you need to read the 50mb payload a second time to process it (although from memory will be faster it is still completely unnecessary).

i := pool.Get()
gr, ok := i.(*gzip.Reader)
if !ok {
return echo.NewHTTPError(http.StatusInternalServerError, i.(error).Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're trying reading the request body, wouldn't this be a 4xx?

Copy link
Contributor Author

@davidmdm davidmdm Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was as is. Regardless I believe this is correct.
Since sync.Pool returns an interface{} we have no choice but to cast the value to the underlying type to work with it.
This middleware creates a sync.Pool to amortise allocation of gzip.Reader objects. If for some reason the pool is returning something else that we aren't expecting than something is wrong with the lib, not the request. Hence the 500.

An argument could be made that this might even be a good spot to panic, but I don't want to introduce unneeded changes in this PR.

@davidmdm
Copy link
Contributor Author

davidmdm commented Nov 7, 2021

@aldas is there anything blocking this PR? I would like to get this through for use at my company.

@aldas
Copy link
Contributor

aldas commented Nov 7, 2021

I do not know, I have not tried it yet, only looked diagonally at the code and wrote my initial thoughts.

If it so time critical you can copy that modified middleware code and use it already today and switch to Echo provided version when it gets merged and released.

@aldas
Copy link
Contributor

aldas commented Nov 14, 2021

@davidmdm I am struggling here. How did you test this? I can not see readCloserCustom.Close() executed at all. http.Server guarantees to close body but this will not happen by my tests.

This is example I am using:

func main() {
	e := echo.New()

	e.Use(middleware.Logger())
	e.Use(middleware.Recover())
	e.Use(middleware.Decompress())

	e.POST("/", func(c echo.Context) error {
		body, err := ioutil.ReadAll(c.Request().Body)
		if err != nil {
			return err
		}
		
		// so we could have multiple running request coroutines for pooling
		// watch -n 0.1 curl -v -i http://localhost:8080 -H'Content-Encoding: gzip' --data-binary @51kb.csv.gz
		time.Sleep(500 * time.Millisecond)
		return c.String(http.StatusOK, string(body))
	})

	log.Fatal(e.Start(":8080"))
}

I'm debugging what happens to request body and far I know that the http.Server will close request body here
https://github.com/golang/go/blob/5337e53dfa3f5fde73b8f505ec3a91c628e8f648/src/net/http/server.go#L1637

but that w.reqBody is original body (seen from debugger) not the one we put as a replacement, so body that Decompress created will not be closed by my tests.

@aldas
Copy link
Contributor

aldas commented Nov 14, 2021

why not just do?

			defer func() {
				gr.Close()
				pool.Put(i)
				b.Close()
			}()
			c.Request().Body = gr

			return next(c)

@davidmdm
Copy link
Contributor Author

@aldas I hadn't realised that the underlying Close that the http.Server does is not on the userland exposed request body but on an underlying w.reqBody.

Your proposed simplification works perfectly.

@codecov
Copy link

codecov bot commented Nov 20, 2021

Codecov Report

Merging #2018 (77e02f0) into master (d604704) will increase coverage by 0.09%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2018      +/-   ##
==========================================
+ Coverage   91.23%   91.32%   +0.09%     
==========================================
  Files          33       33              
  Lines        2887     2871      -16     
==========================================
- Hits         2634     2622      -12     
+ Misses        161      159       -2     
+ Partials       92       90       -2     
Impacted Files Coverage Δ
middleware/decompress.go 96.29% <93.75%> (+7.92%) ⬆️
middleware/rate_limiter.go 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d604704...77e02f0. Read the comment docs.

@lammel
Copy link
Contributor

lammel commented Nov 21, 2021

Haven't worked through the changes, but supporting streaming decompression is very welcome.

@davidmdm
Copy link
Contributor Author

davidmdm commented Dec 2, 2021

@lammel Anything holding up this PR?

Copy link
Contributor

@aldas aldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

If someone needs to test this by hand.

func main() {
	e := echo.New()

	e.Use(middleware.Logger())
	e.Use(middleware.Recover())
	e.Use(middleware.Decompress())

	e.POST("/", func(c echo.Context) error {
		body, err := io.ReadAll(c.Request().Body)
		if err != nil {
			return err
		}

		return c.String(http.StatusOK, string(body))
	})

	log.Fatal(e.Start(":8080"))
}
echo '{ "mydummy" : "json" }' | gzip > body.gz
curl -v http://localhost:8080 -H'Content-Encoding: gzip' --data-binary @body.gz

@lammel
Copy link
Contributor

lammel commented Dec 3, 2021

@davidmdm Thanks for your contribution!
Let's get that thing merged now!

@lammel lammel merged commit b437ee3 into labstack:master Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants