Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data race condition #657

Open
xeanhort opened this issue Nov 7, 2023 · 2 comments
Open

Data race condition #657

xeanhort opened this issue Nov 7, 2023 · 2 comments
Labels
bug Details Required Further information (logs etc) is required before this issue can be investigated

Comments

@xeanhort
Copy link

xeanhort commented Nov 7, 2023

I'm using version 1.4.2 and I sporadically had a data race detection when using the package. As it's only related to the internal functions of this library, I guess it shouldn't be related to my code:

==================
WARNING: DATA RACE
Read at 0x00c000627bc1 by goroutine 307229:
  github.com/eclipse/paho.mqtt.golang/packets.(*FixedHeader).pack()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/packets/packets.go:267 +0x7a
  github.com/eclipse/paho.mqtt.golang/packets.(*PublishPacket).Write()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/packets/publish.go:47 +0x291
  github.com/eclipse/paho%2emqtt%2egolang.startOutgoingComms.func1()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:293 +0x1274
Previous write at 0x00c000627bc1 by goroutine 307196:
  github.com/eclipse/paho%2emqtt%2egolang.(*client).resume()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:1087 +0x145e
  github.com/eclipse/paho%2emqtt%2egolang.(*client).reconnect()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:339 +0x71d
  github.com/eclipse/paho%2emqtt%2egolang.(*client).internalConnLost.func1.1()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:560 +0x47
Goroutine 307229 (running) created at:
  github.com/eclipse/paho%2emqtt%2egolang.startOutgoingComms()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:264 +0x3e4
  github.com/eclipse/paho%2emqtt%2egolang.startComms()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:394 +0xfb
  github.com/eclipse/paho%2emqtt%2egolang.(*client).startCommsWorkers()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:656 +0xdd8
  github.com/eclipse/paho%2emqtt%2egolang.(*client).reconnect()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:338 +0x6e4
  github.com/eclipse/paho%2emqtt%2egolang.(*client).internalConnLost.func1.1()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:560 +0x47
Goroutine 307196 (running) created at:
  github.com/eclipse/paho%2emqtt%2egolang.(*client).internalConnLost.func1()
      /home/usr/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:560 +0x664
==================
@MattBrittan
Copy link
Contributor

Read paho.mqtt.golang@v1.4.2/packets/packets.go:267 is:


func (fh *FixedHeader) pack() bytes.Buffer {
	var header bytes.Buffer
	header.WriteByte(fh.MessageType<<4 | boolToByte(fh.Dup)<<3 | fh.Qos<<1 | boolToByte(fh.Retain)) // Line 267
	header.Write(encodeLength(fh.RemainingLength))
	return header
}

Previous write is: paho.mqtt.golang@v1.4.2/client.go:1087 is:

if p.Qos != 0 { // spec: The DUP flag MUST be set to 0 for all QoS 0 messages
	p.Dup = true   // Previous write HERE
}

So it looks like this issue relates to the way the Dup flag on the publish packet is being resent upon reconnection. I'm guessing that you are using the memory store (the default)?

What I think is happening (difficult to be sure without logs) is:

  1. Connection is up
  2. Publish is called; it checks c.status.ConnectionStatus() and sends message to c.obound (which blocks because other messages are being processed)
  3. Connection drops before the message is picked up from the channel
  4. Connection comes back up and, as part of resume we attempt to resend the publish packet.
  5. The old OutgoingComms goroutine attempts to send the Publish (this will fail because the old connection is down)

If I'm correct then this is not a big concern; the OutgoingComms routine remains running until it's emptied all of it's input channels (this is deliberate to avoid deadlocks) but it's attempting to send to a closed connection (so will just get an error every time). As such the packet with questionable integrity is effectively being thrown away anyway. Generally I'd expect the OutgoingComms goroutine to exit quickly when the connection has dropped (establishing a new connection should take longer); I'm not really sure why its taking longer in your case. Potentially I could see it being due to:

  • Connecting with a lot of stored messages, immediately publishing a new message and then the connection drops (the library will attempt to deliver all of those messages first which delays new publish packets).
  • Potentially your code calling Publish from within a message handler (this will block things up leading to unpredictable results; subject to the options in use).

Without the ability to replicate this (and/or logs) it's going to be difficult to fix. For historical reasons the library spins up quite a few goroutines and the interactions between them is complex (so it's easy to create new issues when fixing something like this).

One option would be to modify Publish such that it fails more quickly when the connection drops. Unfortunately as I'm not 100% sure if I'm right about the cause this may, or may not, address the issue!

@xeanhort
Copy link
Author

xeanhort commented Nov 8, 2023

Thank you for your feedback.

The connection is unstable but I saw this message a couple of times. I'll add the log handlers and try to reproduce it.

@MattBrittan MattBrittan added bug Details Required Further information (logs etc) is required before this issue can be investigated labels Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Details Required Further information (logs etc) is required before this issue can be investigated
Projects
None yet
Development

No branches or pull requests

2 participants