Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve the problem that the connection is closed but waiting will always wait #44

Closed
wants to merge 1 commit into from

Conversation

shifengbin
Copy link
Contributor

steps:
1.connected
2.select confirm
3.publishwithconfirm (network error, connection closed)
4.wait (always wait forever)

@lukebakken
Copy link
Contributor

Thank you for this contribution. I will try to reproduce the problem that this pull request solves. Do you have a way to cause the connection closed error?

@lukebakken lukebakken self-assigned this Feb 16, 2022
@lukebakken lukebakken modified the milestones: 2.0.0, 1.3.1 Feb 16, 2022
@lukebakken lukebakken added the bug Something isn't working label Feb 16, 2022
@shifengbin
Copy link
Contributor Author

you can use wifi and close it before wait confirm

@shifengbin
Copy link
Contributor Author

before received confirm
for example add broken point publish function and close wifi
continue

@shifengbin
Copy link
Contributor Author

        con, err := Dial("xxxxx")
	if err != nil {
		t.Fatal(err)
	}
	ch, err := con.Channel()
	if err != nil {
		t.Fatal(err)
	}
	err = ch.Confirm(false)
	if err != nil {
		t.Fatal(err)
	}
	closed := con.NotifyClose(make(chan *Error, 1))
	go func() {
		<-closed
		t.Log("connection is closed")
	}()


	//broken point at here or sleep 5 seconds;  close wifi
	confirm, err := ch.PublishWithDeferredConfirm("xxx", "xxx", false, false, Publishing{
		Body: []byte("abc"),
	})
	if err != nil {
		t.Fatal(err)
	}
	confirm.Wait()

@shifengbin
Copy link
Contributor Author

When programs are processed in parallel, one channel sends normal messages and the other channel sends a non-existent Exchange, also causing the connection to be closed

@shifengbin
Copy link
Contributor Author

Thank you for this contribution. I will try to reproduce the problem that this pull request solves. Do you have a way to cause the connection closed error?

can you reproduce the problem with my method?

@lukebakken
Copy link
Contributor

can you reproduce the problem with my method?

Please be patient. The RabbitMQ core team is small and has many different priorities to consider. I might be able to work on this issue next week.

@shifengbin
Copy link
Contributor Author

can you reproduce the problem with my method?

Please be patient. The RabbitMQ core team is small and has many different priorities to consider. I might be able to work on this issue next week.

excuse me! do you forget it?

I see some deadlock issues is the same reason

we use the package in production hope to resolve the issue
thanks

@SpencerTorres
Copy link
Contributor

SpencerTorres commented Mar 4, 2022

Just ran into this while testing some edge cases on an app of ours that demands publisher confirmations. I managed to trigger this deadlock by starting the app and then deleting the exchange through RabbitMQ's management UI. Similar to how it's described here, the exchange did not exist and somewhere along the way the client tries to close the connection.

The proposed change here is effectively the same conclusion that I and #46 came to; the wait groups need to be cleaned up when the confirmations are closed. This solution returns Ack: false which should already be checked with .Wait() as seen here:

// module code
func (d *DeferredConfirmation) Wait() bool {
    d.wg.Wait()
    return d.confirmation.Ack
}

// .   .   .

// somewhere in your app's publish() logic
if ok := dc.Wait(); !ok {
    // Ack was false
}

With this expectation in mind for apps that use .Wait(), I believe this PR is a decent solution.

I made some project-friendly adjustments to be added to this PR here: shifengbin/pull/1

Edit: I just tried my changes from there in my application and it successfully reconnects and ends the wait groups which allows me to catch the error. Now I can continue development. Thanks for reporting this! Hopefully it gets merged so it gets fixed for everyone else.

Copy link
Contributor

@DanielePalaia DanielePalaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. The error happens even when the connection get closed by the client before the confirm wait.

Can you add an integration test like this in order to approve:

// https://github.com/rabbitmq/amqp091-go/pull/44
func TestShouldNotWaitAfterConnectionClosedIssue44(t *testing.T) {
	con := integrationConnection(t, "TestShouldNotWaitAfterConnectionClosedIssue44")
	ch, err := con.Channel()
	if err != nil {
		t.Fatalf("channel error: %v", err)
	}
	err = ch.Confirm(false)
	if err != nil {
		t.Fatalf("confirm error: %v", err)
	}
	closed := con.NotifyClose(make(chan *Error, 1))
	go func() {
		<-closed
	}()

	//broken point at here or sleep 5 seconds;  close wifi
	confirm, err :=
		ch.PublishWithDeferredConfirm("xxx", "xxx", false, false, Publishing{
			Body: []byte("abc"),
		})
	if err != nil {
		t.Fatalf("PublishWithDeferredConfirm error: %v", err)
	}

	ch.Close()
	con.Close()

	ack := confirm.Wait()

	if ack != false {
		t.Fatalf("ack returned should be false %v", ack)
	}
}

@lukebakken
Copy link
Contributor

This PR is superseded by #47

@lukebakken lukebakken closed this Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants