Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method amqp.Connection.Channel() is blocked for infinity while there is no Internet #225

Open
dr-begemot opened this issue Oct 5, 2023 · 13 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@dr-begemot
Copy link

dr-begemot commented Oct 5, 2023

Describe the bug

Method amqp.Connection.Channel() is infinity blocked when internet turned off in the middle of DialConfig() and Channel():

	config := amqp.Config{Properties: amqp.NewConnectionProperties()}
	config.Properties.SetClientConnectionName(connName)
	config.Heartbeat = 10 * time.Second
	log.Debug().Msgf("dialing %q", amqpURI)
	conn, err := amqp.DialConfig(amqpURI, config)
	if err != nil {
		return nil, fmt.Errorf("Dial: %s", err)
	}

        // turn off internet here
        log.Debug().Msgf("turn off Internet!")
	time.Sleep(10 * time.Second)

	log.Debug().Msgf("got Connection")
	channel, err := conn.Channel() // infinity blocked while no internet
	if err != nil {
		return nil, fmt.Errorf("Channel: %s", err)
	}

Reproduction steps

  1. Dial to rabbitmq server and get Connection
  2. Turn off internet
  3. Call Connection.Channel() method

Expected behavior

Channel() method should return error after timeout.

Additional context

No response

@dr-begemot dr-begemot added the bug Something isn't working label Oct 5, 2023
@lukebakken lukebakken added this to the 2.0.0 milestone Oct 5, 2023
@lukebakken
Copy link
Contributor

Hello, thanks for using this library.

Could you be more specific about what "turn off internet" means? What exactly are you doing?

@lukebakken lukebakken self-assigned this Oct 6, 2023
@dr-begemot
Copy link
Author

Hello, thanks for using this library.

Could you be more specific about what "turn off internet" means? What exactly are you doing?

I just unplug the Ethernet cable))

@Danlock
Copy link

Danlock commented Nov 6, 2023

Yes, it's because this library doesn't take a context.Context in and actually use it to timeout AMQP call's. You can check out my wrapper which was created to solve this exact problem.

Hopefully #124 is implemented eventually.

@shendongyuxmxm
Copy link

I also encountered this issue. Is there any temporary solution available now? I can't wait for the 2.0.0 version. For example, can I downgrade the version?

@lukebakken
Copy link
Contributor

@shendongyuxmxm please see the comment just above yours - #225 (comment)

At some point we'll have time to investigate this issue. Personally I'm surprised there isn't a socket read/write timeout that catches this particular case.

@Danlock
Copy link

Danlock commented Nov 14, 2023

I wrote an example to demonstrate a similar problem I've had with amqp091-go (a connection hanging), with a similar solution (use contexts throughout amqp091-go). No ethernet unplugging required.

https://github.com/Danlock/rmq/blob/17f5efed0a2038993f1da091398b0a174812a02a/hang_int_test.go#L29

Unfortunately just calling the socket's SetDeadline is not enough to prevent this.

@Zerpet
Copy link
Contributor

Zerpet commented Nov 14, 2023

Thank you @Danlock for providing a repro for this issue, that's very helpful 👍 One challenge I found, when I started thinking about contexts, is the state machine used internally in this library, see #124 (comment) and #124 (comment)

Supporting contexts in almost all functions will likely come at a cost of "drastic" clean up after a context cancels a function i.e. closing the AMQP channel or connection. I guess that's more desirable than a blocked function, but I'm not sure since #124 hasn't got much traction since June.

@Danlock
Copy link

Danlock commented Nov 14, 2023 via email

@Ja7ad

This comment was marked as off-topic.

@Danlock

This comment was marked as off-topic.

@Zerpet
Copy link
Contributor

Zerpet commented Dec 5, 2023

I discussed this issue with some colleagues, and the general consensus is that closing a channel/connection after a timeout is a bit too drastic. The Java and .NET AMQP clients just reset the state machine, and continue as if nothing happened. We probably will do the same in this client, in order to maintain consistency between our client libraries. There are probably some edge cases we need to consider (e.g. subscription frame times out, arrives late and gets discarded, do we receive messages?)

@Zerpet Zerpet self-assigned this Dec 5, 2023
@Danlock
Copy link

Danlock commented Dec 5, 2023

That does seem better to me, and maintaining parity with other language implementations is important.

In regards to that specific edge case, if a subscription frame timed out, presumably the Consume() call also timed out, correct? Therefore Consume would have already returned a nil channel and timeout error to the end user.

In that case I don't think messages should be received at all. If messages were received without a successful accompanying Consume() channel, they probably be rejected so they can be placed back on the queue for any other healthy consumers.

@Zerpet
Copy link
Contributor

Zerpet commented Dec 11, 2023

In regards to that specific edge case, if a subscription frame timed out, presumably the Consume() call also timed out, correct?

Yes, that's correct. Handling the specific return of Consume() is as you described. The challenge/question in the scenario I described is that RabbitMQ server registers the basic.consume (from Consume()) request and sends messages to the client. In that case, our state machine will receive a basic.deliver (a message) for a subscription that should not exist (since Consume() timed out). The state machine can't quite reject the message because it doesn't have a channel at hand to send a basic.cancel or basic.nack. In that specific case (a delivery for non-existing subcription), it'll probably be ok to treat it as a channel exception and close the channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants