Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make remote worker connect to the RabbitMQ after restarting #211

Open
marktwtn opened this issue Dec 17, 2019 · 5 comments
Open

Make remote worker connect to the RabbitMQ after restarting #211

marktwtn opened this issue Dec 17, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@marktwtn
Copy link
Collaborator

If the RabbitMQ is restarted, the remote worker will be closed.
We should modify the remote worker to make it keep trying to connect to the RabbitMQ when it is unavailable.

@marktwtn marktwtn added the enhancement New feature or request label Dec 17, 2019
@marktwtn marktwtn self-assigned this Dec 17, 2019
@marktwtn
Copy link
Collaborator Author

I have known the way to fix the problem. Just initialize again.

However, there are two points need to be concerned:

  • The program execution flow should be taken care with
  • The condition of reinitialization
    The current implementation would only return true or false when consuming message.
    The wrapped function hides the detail of the error type, which is hard to known the exact problem.

@marktwtn
Copy link
Collaborator Author

The wrapped function should be rewrite to reveal more error detail instead of only true or false.

The error type is recorded in the structure amqp_rpc_reply_t and the possible error is listed in the amqp_status_enum.
Once we retrieve the amqp_rpc_reply_t structure, we can check the error type and recovery it.

Currently we only focus on the error type of restarted RabbitMQ broker and recover remote worker with reinitialization.

@marktwtn
Copy link
Collaborator Author

marktwtn commented Jan 6, 2020

The reinitialization will be wrapped as an infinite loop which reinitializes until it is success.

In remote worker, three RabbitMQ APIs should be handled well if error occurs.

However, based on the document,

amqp_basic_ack says

this will not indicate failure if something goes wrong on the broker

and amqp_basic_publish says

error conditions that occur on the broker (such as publishing to a non-existent exchange)
will not be reflected in the return value of this function

I will test the behaviour of the last two APIs when the RabbitMQ broker is stopped or restarted.

@marktwtn
Copy link
Collaborator Author

Even if I close the RabbitMQ broker, the APIs amqp_basic_ack() and amqp_basic_publish() can return without error if the network sockets are not closed yet.

When the network sockets are not closed yet, the first called API would return successfully.
After it returns, the sockets would become closed.
Then the API called later would encounter socket error and fail.

I am still trying to figure out the way to solve the problem.

@marktwtn
Copy link
Collaborator Author

marktwtn commented Jan 16, 2020

To close a socket, we need a four-way handshake.
If we are going to close the RabbitMQ broker, the socket of it should wait for the response of the socket of the remote worker.
As shown in the picture:
tcp-close-state-flow

However, when I close the RabbitMQ broker, it does not wait and it just closes.
The socket of remote worker is in the status of CLOSE-WAIT.
And the APIs amqp_basic_ack and amqp_basic_publish does not fail with the status of CLOSE-WAIT.

There are other people encounter the same issue like me alanxz/rabbitmq-c#461.
The client thinks it is yet connected to the server.

Another related issue: alanxz/rabbitmq-c#391.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant