Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge remote-tracking branch 'apache-kafka/trunk' into ccs-kafka/master #1225

Merged
merged 142 commits into from
May 15, 2024

Conversation

omkreddy
Copy link
Member

@omkreddy omkreddy commented May 13, 2024

Minor conflict in build.gradle

mwesterby and others added 30 commits April 17, 2024 19:03
Reviewers: Igor Soarez <soarez@apple.com>, Federico Valeri <fedevaleri@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…che#15662)

This patch introduces the conversion from a classic group to a consumer group when a member joins with the new consumer group protocol (epoch is 0) but only if the conversion is enabled.

Reviewers: David Jacot <djacot@confluent.io>
…core (apache#15684)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Johnny Hsu <johnnyhsu@fb.com>, Chia-Ping Tsai <chia7712@gmail.com>
…ments in memory table (apache#15631)

Co-authored-by: hzh0425 <642256541@qq.com>

Reviewers: Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Add the support for DescribeTopicPartitions API to AdminClient. For this initial implementation, we are simply loading all of the results into memory on the client side. 

Reviewers: Andrew Schofield <aschofield@confluent.io>, Kirk True <ktrue@confluent.io>, David Jacot <djacot@confluent.io>, Artem Livshits <alivshits@confluent.io>, David Arthur <mumrah@gmail.com>
…9) (apache#15681)

Implements KIP-1019, which exposes method to check if metric is of type Measurable.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Kafka Streams DSL operators use internal wall-clock based throttling
parameters for performance reasons. These configs make the usage of TTD
difficult: users need to advance the mocked wall-clock time in
their test code, or set these internal configs to zero.

To simplify testing, TDD should disable both configs automatically.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
…ache#13824)

RemoteLogMetadataSerde references RemoteLogMetadataTransform in a Raw
form. Given that the class is parametrized we should make use of it.

Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers:  Matthew de Detrich <matthew.dedetrich@aiven.io>, Mickael Maison <mickael.maison@gmail.com>
…ocol (apache#15738)

Updating consumer system test that was failing with the new protocol, related to static membership behaviour. The behaviour regarding static consumers that join with conflicting group instance id is slightly different between the classic and new consumer protocol, so the expectations in the tests needed to be updated.

If static members join with same instance id:

Classic protocol: all members join the group with the same group instance id, and then the first one will eventually fail (receives a HB error with FencedInstanceIdException)

Consumer protocol: new member with an instance Id already in use is not able to join, and first member remains active (new member with same instance Id receives an UnreleasedInstanceIdException in the response to the HB to join the group)

This PR is keeping the single parametrized test that existed before, given that what's being tested and part of the test itself apply to all protocols. This is just updating the expectations that are different, based on the protocol parameter.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Kirk True <ktrue@confluent.io>
Consumer Rolling Upgrade is meant to test the protocol upgrade for the old protocol. Therefore, I am removing old changes.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
…ibuted.py (apache#15594)

Summary of the changes:

Parameterizes the tests to use new coordinator and pass in consumer group protocol. This would be applicable to sink connectors only.
Enhances the sink connector creation code in system tests to accept a new optional parameter for consumer group protocol to be used.
Sets the consumer group protocol via consumer.override. override config when the new group coordinator is enabled.
Note about testing: There are 288 tests that need to be run and running on my local takes a lot of time. I will try to post the test results once I have a full run.

Reviewers: Kirk True <ktrue@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>, Philip Nee <pnee@confluent.io>
…he#15682)

The PR leverages the changes defined in KIP-1019. Does the cleanup for accessing KafkaMetric field by reflection and uses method exposed by KIP-1019 for metric measurability.

Reviewers: Andrew Schofield <aschofield@confluent.io>, Matthias J. Sax <matthias@confluent.io>
…che#15569)

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Nikolay <nizhikov@apache.org>, Federico Valeri <fvaleri@redhat.com>, Chia-Ping Tsai <chia7712@gmail.com>
…e same directory (apache#15136)

It is observed that for scenario (3), i.e. a broker crashes while it
waits for the future replica to catch up for the second time and the
`dir1` is unavailable when the broker is restarted, the
broker tries to create the partition in `dir2` according to the metadata
in the controller. However, ReplicaManager also tries to resume the
stale future replica which was abandoned when the broker crashed. This
results in the renaming of the future replica to fail eventually because
the directory for the topic partition already exists in `dir2` and the
broker then marks `dir2` as offline.

This PR attempts to fix this behaviour by ignoring any future replicas
which are in the same directory as where the log exists. It further
marks the stale future replica for deletion.

Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>,  Igor Soarez <soarez@apple.com>, Proven Provenzano <pprovenzano@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…n LogManager to speed up tests (apache#15719)

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…ME_DEFAULT and SASL_OAUTHBEARER_SUB_CLAIM_NAME_DEFAULT (apache#15760)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Due to difference in packages present when jsa files were generated and when docker image is generated, there is a log on starting docker image.

[0.001s][warning][cds] The shared archive file has a bad magic number: 0

There is no functionality impact, only startup time is higher because of this issue.

This PR fixes the warning and improves startup performance of the docker image

Reviewers:  Manikumar Reddy <anikumar.reddy@gmail.co
…itSync and close (apache#15613)

The javadoc for KafkaConsumer.commitSync says:

Note that asynchronous offset commits sent previously with the {@link #commitAsync(OffsetCommitCallback)}
(or similar) are guaranteed to have their callbacks invoked prior to completion of this method.

This is not always true in the async consumer, where there is no code at all to make sure that the callback is executed before commitSync returns.

Similarly, the async consumer is also missing logic to await callback execution in close. While the javadoc doesn't explicitly promise callback execution, it promises "completing commits", which one would reasonably expect to include callback execution. Also, the legacy consumer contains some code to execute callbacks before closing.

This change proposed a number of fixes to clean up the callback execution guarantees in the async consumer:

We keep track of the incomplete async commit
futures and wait for them to complete before returning from
commitSync or close (if there is time).
Since we need to block to make sure that our previous commits are
completed, we allow the consumer to wake up.
Some similar gaps are addressed in the legacy consumer, see apache#15693

Testing
Two new integration tests and a couple of unit tests.

Reviewers: Bruno Cadonna <cadonna@apache.org>, Kirk True <ktrue@confluent.io>, Lianet Magrans <lianetmr@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Kuan-Po (Cooper) Tseng <brandboat@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…nt to do is to find a single matched record from remote storage (apache#15765)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Implementation of KIP-773 deprecated iotime-total and io-waittime-total metrics. It wasn't expected to mark io-ratio and io-wait-ratio deprecated. However, now they have *Deprecated* in their description. Here is the reason:

    register io-ratio (desc: *Deprecated* The fraction of time ...) -> registered
    register iotime-total (desc: *Deprecated* The total time ...) -> registered
    register io-ratio (desc: The fraction of time ...) -> skipped, the same name already exists in registry
    register io-time-ns-total (desc: The total time ...) -> registered

As a result, io-ratio has incorrect description. The same for io-wait-ratio. This PR fixes these descriptions..

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
jeqo and others added 23 commits May 8, 2024 14:27
KAFKA-16685: Add parent exception to RLMTask warning logs

Reviewers: Josep Prat <josep.prat@aiven.io>
…#15892)

As a part of apache@2e8d69b, we had introduced the TransactionAbortableException in AK. On more detailed analysis we figured out that the enum SupportedOperation was a bit misleading. Hence updated the same to TransactionSupportedOperation to allow a better and more defined function signature

Reviewers: Justine Olshan <jolshan@confluent.io>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…upCommandIntegrationTest (apache#15872)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chris Egerton <chrise@aiven.io>
… pluggable TaskAssignors. (apache#15887)

This is the first PR in a sequence to support custom task assignors in Kafka Streams, which was described in KIP 924. It creates and exposes all of the interfaces that will need to be implemented during the refactor of the current task assignment logic.

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…erConfigProperty (apache#15715)

Introduce a new field id in annotation ClusterConfigProperty. The main purpose of new field is to define specific broker/controller(kraft) property. And the default value is -1 which means the ClusterConfigProperty will apply to all broker/controller.

Note that under Type.KRAFT mode, the controller id starts from 3000, and then increments by one each time. Other modes the broker/controller id starts from 0 and then increments by one.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…pache#15573)

MissingSourceTopicException should contain the name of the missing topic.
There is one corner case for which we don't have the topic name at hand, but we can log the topic
name somewhere else.

Reviewers: Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Configs default.windowed.value.serde.inner and default.windowed.key.serde.inner
were replace with windowed.inner.class.serde. This PR updates the docs accordingly,
plus a few more side cleanups.

Reviewers: Matthias J. Sax <matthias@confluent.io>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…stsTest (apache#15907)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
)

We observe some thread leaks in CI which point to the executor service
thread. This change tries to shutdown the executor service using the
helper method in `ThreadUtils`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Igor Soarez <soarez@apple.com>
…pache#15345)

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Vedarth Sharma <142404391+VedarthConfluent@users.noreply.github.com>
… enableRequestProcessing (apache#14729)

Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Chris Egerton <chrise@aiven.io>, hudeqi <1217150961@qq.com>, Qichao Chu <qichao@uber.com>
KIP-227 introduced in-memory caching of FetchSessions. Brokers with a large number of Fetch requests suffer from contention on trying to acquire a lock on FetchSessionCache.

This change aims to reduce lock contention for FetchSessionCache by sharding the cache into multiple segments, each responsible for an equal range of sessionIds. Assuming Fetch requests have a uniform distribution of sessionIds, the probability of contention on a segment is reduced by a factor of the number of segments.

We ensure backwards compatibility by ensuring total number of cache entries remain the same as configured and sessionIds are randomly allocated.

Reviewers: Igor Soarez <soarez@apple.com>, Chia-Ping Tsai <chia7712@gmail.com>
…etion (apache#15902)

Write events create and add a TimerTask to schedule the timeout operation. The issue is that we pile up the number of timer tasks which are essentially no-ops if replication was successful. They stay in memory for 15 seconds (default write timeout) and as the rate of write increases, the impact on memory usage increases.

Instead, cancel the corresponding write timeout task when the write event is committed to the log. This also applies to complete transaction events.

Reviewers: David Jacot <djacot@confluent.io>
@omkreddy omkreddy requested review from a team as code owners May 13, 2024 08:09
Copy link

cla-assistant bot commented May 13, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 32 committers have signed the CLA.

✅ AndrewJSchofield
✅ sjhajharia
❌ frankvicky
❌ mumrah
❌ dajac
❌ jeffkbkim
❌ dongnuo123
❌ lianetm
❌ mjsax
❌ C0urante
❌ yuz10
❌ jeqo
❌ kamalcph
❌ lucasbru
❌ AyoubOm
❌ FrankYang0529
❌ vamossagar12
❌ ivanyu
❌ apourchet
❌ cadonna
❌ clolov
❌ gharris1727
❌ Cerchie
❌ chickenchickenlove
❌ gaurav-narula
❌ johnnychhsu
❌ KevinZTW
❌ chia7712
❌ omkreddy
❌ mpareja
❌ sidyag
❌ brandboat
You have signed the CLA already but the status is still pending? Let us recheck it.

@omkreddy omkreddy merged commit 4c5686b into master May 15, 2024
1 of 4 checks passed
@omkreddy omkreddy deleted the omkreddy-fix-trunk branch May 15, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet