Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka: fix producing with acks=0 #1630

Merged
merged 2 commits into from
May 21, 2024
Merged

Kafka: fix producing with acks=0 #1630

merged 2 commits into from
May 21, 2024

Conversation

rukai
Copy link
Member

@rukai rukai commented May 17, 2024

Discovery

This managed to avoid being hit because the test cases were running acks=0 last and never sending any more messages down those connections.
But while investigating #1588 this started showing up in the acks0 test case presumably because the java driver was reusing the connections it sent the acks=0 produce requests down.

Background info

Codecs are an optional helper mechanism provided by the tokio_util crate.
They make it easy to attach an implementation of a protocol to a TCP socket.
In shotover every database that we support has a unique codec implemented for its protocol.
There is no single Codec type, instead there are separate read and write codecs, called the Encoder and the Decoder.

The kafka protocol does not specify the message type of the responses sent by the kafka broker.
Instead it is up to the client to remember the message type of each request it sends out and then match up each response that comes in with its corresponding request so that the response can be correctly decoded.

In order to decode kafka responses, shotover needs the kafka outgoing encoder to instruct the outgoing decoder on how to decode each response.
This is done via an mpsc channel, specifically we use the mpsc channel from the standard library.

let (tx, rx) = match self.direction {
Direction::Source => (None, None),
Direction::Sink => {
let (tx, rx) = mpsc::channel();
(Some(tx), Some(rx))
}
};

The problem

However there is one exception to this that we were not handling!
The kafka produce request can set a field named acks to determine how many acks the leader broker must receive before responding to the request.
As a special case if acks = 0 then the leader broker will never respond to that request.
This is used if latency is the highest concern and lost messages are acceptable.

Shotover will never receive a response for a produce with acks=0.
So if the encoder tells the decoder to expect a produce response it will expect the next message to be a produce even though that is not the case. This results in attempting to decode responses as the wrong type and therefore failure to decode.

The fix

The correct fix here is to have the encoder skip sending message type information if acks = 0.
Specifically that is done by checking the result of the Message::response_is_dummy() method which checks if the request will not receive a response, including a check for acks = 0 on kafka messages.

Now that the issue is fixed we can enable the failing tests in kafka_int_tests/test_cases.rs.

Copy link

codspeed-hq bot commented May 17, 2024

CodSpeed Performance Report

Merging #1630 will improve performances by 16.51%

Comparing rukai:kafka_fix_acks0 (f0f528d) with main (3cdefda)

Summary

⚡ 1 improvements
✅ 36 untouched benchmarks

Benchmarks breakdown

Benchmark main rukai:kafka_fix_acks0 Change
encode_request_produce 25.9 µs 22.2 µs +16.51%

@rukai
Copy link
Member Author

rukai commented May 20, 2024

The encode_request_produce microbenchmark above slightly improved performance since acks=0 no longer sends the header info to the decoder.

@rukai rukai marked this pull request as ready for review May 20, 2024 22:54
@rukai rukai requested a review from conorbros May 20, 2024 23:19
@rukai rukai merged commit 4c0bf18 into shotover:main May 21, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants