-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the esleg bulk response parsing #36275
Improve the esleg bulk response parsing #36275
Conversation
This go benchmark is going to serve as our baseline for future changes. Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
When we create a new connection, we are going to create a new bytes.Buffer which we are going to use in order to consume the httpResponse using the io.Copy which produce less memory allocation. Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
This pull request doesn't have a |
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
This commit adds a new parameter in the output settings called `bulk_response_filtering` which by default is going to be true. This parameter when is true is going to append the parameter of the ResponseFiltering to the Bulk request the beat is doing. We are adding this parameter so that we can opt-out if we want to diagnose a problem or if someone want to keep using the older way we execute bulk requests. Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
This commit address the lint error for the unhandled error not been checked. Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
This is to address the linting rule that triggered the following violation. ``` should rewrite http.NewRequestWithContext or add (*Request).WithContext (noctx) ``` Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
@leehinman Will this improvement automatically benefit the shipper or does the same change need to be made elsewhere for the shipper as well? |
should be automatic, when filebeat is acting as an elasticsearch shipper it is using the 'elasticsearch' output. |
A similar change needs to be made for other parts of the codebase that is using the |
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
this complete the migration of this Bulkfiltering from the esleg to the output package Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Since all tests before this change are using the filtered response and they were passing there is no need for another flag. This change actually make the tests reflect closer to the actual behaviour. Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
} | ||
|
||
conn := Connection{ | ||
ConnectionSettings: s, | ||
HTTP: esClient, | ||
Encoder: encoder, | ||
log: logger, | ||
responseBuffer: bytes.NewBuffer(nil), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this field needs to be protected by a mutex.
Let's say two goroutines, G1 and G2 are running concurrently, and both end up calling conn.execHTTPRequest
at some point in their execution. Now imagine this sequence of events:
- G1 copies
resp.Body
intoconn.responseBuffer
(line 458). - G2 calls
conn.responseBuffer.Reset()
(line 457). - G1 returns
conn.responseBuffer.Bytes()
(line 468).
The result will be that G1 will return an empty slice of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You bring a valid concern here. Before going in and adding a mutex which is potentially defeating what we want to achieve and actually slowing down the process.
I might be wrong, but this is the assumption I had, each go routine would have its own connection. One connection would not be shared between 2 go routines, if 2 goroutines needed a connection to the ES they would create a new connection client thus a new buffer.
If we absolutely need to add a mutex, I would rather change the implementation because if there is a case that 2 go routines shared the same connection then adding a mutex would be similar to not having go routines in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right that as it stands today the only way two goroutines could concurrently call the execHTTPRequest
method is if they were sharing the same Connection
object. And from what I can tell, that is not happening anywhere in the code.
Besides checking this by eyeballing the code paths that lead to the execHTTPRequest
method, I also ran go test -race ./libbeat/... -count 1 | grep -i 'data race' | wc -l
from the root folder of the Beats repo, first in the main
branch and then in this PR's branch. In both cases I got the same results, which is an quick sanity check that the changes in this PR are most likely not introducing additional data races:
Running in main
$ go test -race ./libbeat/... -count 1 | grep -i 'data race' | wc -l
22
Running in this PR's branch
$ go test -race ./libbeat/... -count 1 | grep -i 'data race' | wc -l
22
Side note: for the data races that are being reported on main
, I filed #36393.
However, in the future, code may be written such that two concurrently-executing goroutines are sharing the same Connection
object.
The safest thing to do here would be to add a mutex, but as you noted, this will likely cut into the performance gains we are seeing from the change in this PR.
The next best thing to do, IMO, would be to document clearly in a comment above the execHTTPRequest
method AND in comments above each of the constructor functions that return a new Connection
— NewConnection
, NewClients
, and NewConnectedClient
— AND in a comment above the Connection
struct itself that this method/struct is not threadsafe. You could add a link in your comments to this discussion here on GitHub in case someone wants to know all the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ for the help here, I agree that race conditions are something to be looked-at at in different work streams.
However, in the future, code may be written such that two concurrently-executing goroutines are sharing the same Connection object.
I think hope that once we move away from the esleg and to a different output client, this part of the code should not be used anymore.
I added some comment about the thread-safety, please take a look at f181725
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments LGTM. Thanks @alexsapran!
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
Signed-off-by: Alexandros, Sapranidis <alexandros@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice performance improvements!
conn.responseBuffer.Reset() | ||
_, err = io.Copy(conn.responseBuffer, resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexsapran I think we should not re-use the buffer here because the size is not static. This buffer will never release its allocated memory and the size of this memory will be equal to the size of the biggest ever processed HTTP response. I'd rather prefer to release the memory and re-allocate smaller buffers when needed.
I think we should switch it in favour of elastic/elastic-agent-libs#183 and compare performance. This implementation should be the most optimized one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct @rdner to the use case you mention, but this might be just fine in this particular scenario because the output is relational to the batching we do.
So the response size is within a ballpark the same for each bulk we publish.
We can adjust this and make it so that we only allocate a buffer just for the consumption of that response.
Since we are in the process of evaluating other usages of the ioutil.ReadAll
it might make sense to adjust this as well.
Proposed commit message
Currently, I am raising this PR in order to start a discussion with potential improvements I found while reviewing some Beat profiles.
While doing some synthetic benchmarks I noticed that we are using the package
ioutil.ReadAll
which is deprecated. Going down that path to replace I took a deeper look into how we do bulk requests using the esleg package.I introduced a go benchmark to baseline my changes and here is the comparison between main and the proposed changes
We can see this by replacing the
ioutil.ReadAll
and adding reusablebytes.Buffer
for consuming the response we managed to save on memory allocation.Additionally, I will be pushing another commit, reducing the overall response we get from the
_bulk
request which is going to even further reduce the amount of memory we need to copy/allocate for the response.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
Signed-off-by: Alexandros, Sapranidis alexandros@elastic.co