Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Pulsar client : Connect Command add keep_alive_interval #15885

Open
Technoboy- opened this issue Jun 2, 2022 · 10 comments
Open

[Discuss] Pulsar client : Connect Command add keep_alive_interval #15885

Technoboy- opened this issue Jun 2, 2022 · 10 comments
Labels

Comments

@Technoboy-
Copy link
Contributor

Technoboy- commented Jun 2, 2022

Motivation

When investigating #13342, we found that both the client and the server have the keepAliveIntervalSeconds configuration, which is 30s by default. During the configured time, the channel will send ping/pong commands to maintain connection availability. If the pong command is not replied within the cycle, the channel will be closed. For the client side, the reconnect logic is triggered after the channel is closed. For the broker side, the broker will clear the producer information after the channel is inactive.
For the problem of #13342, it is because the user changed the configuration on the broker side to 100s. When the client determines that the connection has timed out and needs to disconnect the channel, since the client to the broker passes through the firewall, the close of the channel may not be sent, and then the client reconnects to the broker, and the reconnection succeeds. However, the timeout setting of the broker is relatively large. If the previous channel is not closed, the producer information will not be cleared. The reconnection of the producer will cause the broker to throw the exception that the producer already exists.
This is the cause of the #13342 issue, and by tweaking the code, the issue can be reproduced.

What I want to discuss is whether we can optimize this, configure this value only on the client-side, and pass it to the broker through the connect command. The advantage is that the server can cancel this configuration, using client-side value instead, and multiple clients can configure different values.

API Changes

Add keep_alive_interval in CommandConnect:

message CommandConnect {
     ...
     optional int32 keep_alive_interval = 11 [default = 30];
}

The original logic to check the keep-alive is in PulsarHandler#handleKeepAliveTimeout which begins at channel active.

For broker-side:

  • Now we do this in ServerCnx#completeConnect

For client-side:

  • Now we do this in ClientCnx#handleConnected

Compatibility

no compatibility issues

@lhotari
Copy link
Member

lhotari commented Jun 2, 2022

What I want to discuss is whether we can optimize this, configure this value only on the client-side, and pass it to the broker through the connect command. The advantage is that the server can cancel this configuration, and multiple clients can configure different values.

I think that this is unnecessary complexity. The broker side keep alive should be kept short enough and that should be independent of the client setting.

There were some bugs that caused the connection to stay open for longs durations.
#15382 Closes the connection if a ping or pong message cannot be sent .
#15366 Fixes a proxy connection leak when inbound connection closes while connecting is in progress
Both of the above fixes address issues where the symptom is "Producer with name xxx is already connect to topic".

I don't think that we need this PIP at all.

@lhotari
Copy link
Member

lhotari commented Jun 2, 2022

The steps for reproducing "Producer with name xxx is already connect to topic" in #13342 doesn't describe a bug. It's expected and correct behavior of the broker. I commented in #13342 (comment) about this. Therefore this PIP is attempting to solve a problem that isn't a problem at all.

@Technoboy-
Copy link
Contributor Author

The steps for reproducing "Producer with name xxx is already connect to topic" in #13342 doesn't describe a bug. It's expected and correct behavior of the broker. I commented in #13342 (comment) about this. Therefore this PIP is attempting to solve a problem that isn't a problem at all.

I don't think it's a problem either, only an optimization. The broker side does not need this configuration at all, and uses the client-side instead.

@lhotari
Copy link
Member

lhotari commented Jun 2, 2022

I don't think it's a problem either, only an optimization. The broker side does not need this configuration at all, and uses the client-side instead.

I don't see how this could be an optimization when it's not needed at all.

@Technoboy-
Copy link
Contributor Author

I don't think it's a problem either, only an optimization. The broker side does not need this configuration at all, and uses the client-side instead.

I don't see how this could be an optimization when it's not needed at all.

Broker doesn't need this configuration anymore, that's not an optimization?

@lhotari
Copy link
Member

lhotari commented Jun 2, 2022

I don't think it's a problem either, only an optimization. The broker side does not need this configuration at all, and uses the client-side instead.

I don't see how this could be an optimization when it's not needed at all.

Broker doesn't need this configuration anymore, that's not an optimization?

There is no need to make the broker's keep alive interval match the client's keep alive interval. They should be kept separate.

@Technoboy-
Copy link
Contributor Author

I don't think it's a problem either, only an optimization. The broker side does not need this configuration at all, and uses the client-side instead.

I don't see how this could be an optimization when it's not needed at all.

Broker doesn't need this configuration anymore, that's not an optimization?

There is no need to make the broker's keep alive interval match the client's keep alive interval. They should be kept separate.

Why should be separated?

@Technoboy-
Copy link
Contributor Author

I don't think it's a problem either, only an optimization. The broker side does not need this configuration at all, and uses the client-side instead.

I don't see how this could be an optimization when it's not needed at all.

Broker doesn't need this configuration anymore, that's not an optimization?

There is no need to make the broker's keep alive interval match the client's keep alive interval. They should be kept separate.

This value is based on the channel according to the client configuration, is there any problem? Why separate configuration?

@lhotari
Copy link
Member

lhotari commented Jun 2, 2022

Why should be separated?
This value is based on the channel according to the client configuration, is there any problem? Why separate configuration?

Since there isn't any problem that needs to be solved. Making the broker keep alive interval match the client's keep alive interval isn't solving a real problem. The broker side keep alive interval should be kept at 30 seconds and there's nothing to change.
We could simply document this behavior that if you increase the keep alive interval, there's a risk that connections are orphaned until the keep alive check runs.

@Technoboy- Technoboy- changed the title PIP-163: Pulsar client : Connect Command add keep_alive_interval [Discuss] Pulsar client : Connect Command add keep_alive_interval Jun 4, 2022
@github-actions
Copy link

github-actions bot commented Jul 5, 2022

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants