New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[network] Check the OriginID of a libp2p message corresponds to its authenticated source #1163
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1163 +/- ##
==========================================
- Coverage 55.71% 55.61% -0.10%
==========================================
Files 481 481
Lines 29387 29434 +47
==========================================
- Hits 16373 16370 -3
- Misses 10781 10830 +49
- Partials 2233 2234 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
5572a95
to
4cd3b85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments, otherwise LGTM!
network/p2p/middleware.go
Outdated
|
||
originIdentity, found := identities[originID] | ||
if !found || !originIdentity.NetworkPubKey.Equals(originKey) { | ||
m.log.Warn().Msgf("message claiming to be from nodeID %v with key %x was actually signed by %x and dropped", originID, originIdentity.NetworkPubKey, originKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we may want to distinguish the error messages for the two cases above.
In particular, if !found
is true, in the worst case you may get a segfault by trying to do originIdentity.NetworkPubKey
on this line.
In the best case, originIdentity.NetworkPubKey
is probably null and the error log is confusing for the reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the segfault is possible given ||
shortcuts: https://play.golang.org/p/g7CGIVG7dQ2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that even in the log message itself, you are using originIdentity.NetworkPubkey
because you print it out:
m.log.Warn().Msgf("message claiming to be from nodeID %v with key %x was actually signed by %x and dropped", originID, originIdentity.NetworkPubKey, originKey)
for i := 0; i < m.size; i++ { | ||
select { | ||
case <-m.obs: | ||
case <-time.After(2 * time.Second): | ||
assert.FailNow(m.T(), "could not receive pubsub tag indicating mesh formed") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain when a pubsub tag is emmitted? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andy why getting m.size
of them indicates that a mesh has formed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Libp2p emits Protect
calls with tag arguments marked by their pubsub topic when establishing each peering in the mesh:
https://github.com/libp2p/go-libp2p-pubsub/blob/master/tag_tracer.go
You would usually wait for libp2p.GossipSubD
such links, but here there's only two nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to understand this a bit more.
From what I understand, for each connection both peers would get a Protect
call for each peering, right?
So if there are 5 peers in total, then we should expect to see 8 tags before we can be confident that a mesh has formed, correct? (5 peers, 5 - 1 = 4 edges required for a connected graph, 4 * 2 = 8 tags).
Here, you are only checking for 5 tags.
Maybe I'm not understanding how this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are we sure that Protect
is only called once for each peer when establishing a peering? In otherwise, is counting the number of tags enough, or would it make sense to actually keep track of the peer IDs in case we get multiple tags for the same peer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here's where the calls to Protect
are intercepted.
This is where they originate at the libp2p level, in reaction to a join.
I don't think you're incorrect in your analysis. However:
- when I said "You would usually wait for libp2p.GossipSubD such links" I misremembered, we actually wait for
pubsub.GossipSubD*count
in tests with count = 10 - the way I got to that number is by running the test many times at a log level that allowed me to witness the number of tags emitted,
- indeed, the value there is meant as a fast & loose heuristic to make sure we wait until after the
flow/networking
level that triggers theUpdatePeers
and then thelibp2p
mesh update downstream of that, NOT as a countdown to full mesh, - it was nowhere near the theoretical number we would obtain: if, when a node joins a topic, we get 2m graft messages when there are k nodes already on it (m = k if k < D, m = D otherwise, since we don't count pruning), then for 10 nodes we'd expect 90 messages, and we only witness around 65-ish,
- we don't need to finish mesh formation to ensure message delivery, as you noticed we only need a connected graph.
I'm happy to update to a value you'd find more suitable, but for count = 2 (in the test we're commenting on), I note we have, practically and theoretically, 2 graft messages, every time.
are we sure that Protect is only called once for each peer when establishing a peering?
Yep, the Protect calls otherwise made by our engines are all tagless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation :)
fe7fc04
to
18cf175
Compare
bors merge |
1138: skip connecting to invalid identities in the identity table and log error instead of fatal r=vishalchangrani a=vishalchangrani Found one more place where it was `log.fatal` on a bad identity. 1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker Contributes to #1115. Follow-up of #1129. Edit: there was no problem with the integration tests, 🤦 However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se. Co-authored-by: Vishal <1117327+vishalchangrani@users.noreply.github.com> Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
Build failed (retrying...): |
1028: adding secure-rpc-addr to access node systemd file r=vishalchangrani a=vishalchangrani 1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker Contributes to #1115. Follow-up of #1129. Edit: there was no problem with the integration tests, 🤦 However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se. Co-authored-by: vishal <1117327+vishalchangrani@users.noreply.github.com> Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com> Co-authored-by: Vishal <1117327+vishalchangrani@users.noreply.github.com> Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
18cf175
to
1e69998
Compare
Canceled. |
bors merge |
@@ -360,6 +361,32 @@ func (m *Middleware) Unsubscribe(channel network.Channel) error { | |||
return nil | |||
} | |||
|
|||
// processAuthenticatedMessage processes a message and a source (indicated by its PublicKey) and eventually passes it to the overlay | |||
func (m *Middleware) processAuthenticatedMessage(msg *message.Message, originKey crypto.PublicKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found still a bit confusing.
It sounds like the message has been authenticated, but actually we are going to check the authentication, right?
If so, would checkOriginAndProcess
be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the message has been authenticated,
It has, the message has a clear network origin, which is the signer of its envelope, which signature has been checked.
we are going to check the authentication
No, we are going to check the protocol-level authorship claim against the prior network-level authentication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I see. would be great to add these knowledges into comments
1028: adding secure-rpc-addr to access node systemd file r=vishalchangrani a=vishalchangrani 1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker Contributes to #1115. Follow-up of #1129. Edit: there was no problem with the integration tests, 🤦 However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se. Co-authored-by: vishal <1117327+vishalchangrani@users.noreply.github.com> Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com> Co-authored-by: Vishal <1117327+vishalchangrani@users.noreply.github.com> Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
Build failed (retrying...): |
1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker Contributes to #1115. Follow-up of #1129. Edit: there was no problem with the integration tests, 🤦 However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se. Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
Build failed: |
1e69998
to
f15333d
Compare
bors retry |
1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker Contributes to #1115. Follow-up of #1129. Edit: there was no problem with the integration tests, 🤦 However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se. Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
Build failed: |
[network] Check the originID in sendDirect too
Contributes to #1115.
Follow-up of #1129.
Edit: there was no problem with the integration tests, 🤦
However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [1] [2], so this is not relevant to this PR per se.