retry producer creation upon error after successful topic lookup #1138

zzzming · 2023-11-24T16:44:31Z

Expected behavior

In the newPartitionProducer() function, there should be a retry of grabCnx(). It will be similar to the reconnectToBroker's grabCnx() retry logic.

Java producer has this retry logic.

Actual behavior

At the producer creation call, after a successful topic lookup at grabCnx() in producer_partition.go, if there is a network issue before the COMMAND to create producer sent, the grabCnx() will exit without retry.

We had frequent failures upon the initial producer creation.

Steps to reproduce

It's tricky to reproduce. But we observe the problem more frequently on Azure pod's initialization stage. After implementing the grabCnx() retry in the newPartitionProducer(), the problem has gone away. (Will do a PR)

System configuration

Pulsar version: 2.10

zzzming linked a pull request Nov 24, 2023 that will close this issue

[fix] retry producer creation upon error after succssful topic lookup #1139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retry producer creation upon error after successful topic lookup #1138

retry producer creation upon error after successful topic lookup #1138

zzzming commented Nov 24, 2023

retry producer creation upon error after successful topic lookup #1138

retry producer creation upon error after successful topic lookup #1138

Comments

zzzming commented Nov 24, 2023

Expected behavior

Actual behavior

Steps to reproduce

System configuration