Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[network] Check the OriginID of a libp2p message corresponds to its authenticated source #1163

Merged
merged 5 commits into from Aug 20, 2021

Conversation

huitseeker
Copy link
Contributor

@huitseeker huitseeker commented Aug 18, 2021

Contributes to #1115.
Follow-up of #1129.

Edit: there was no problem with the integration tests, 🤦

However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [1] [2], so this is not relevant to this PR per se.

@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2021

Codecov Report

Merging #1163 (3d91f99) into master (73d6864) will decrease coverage by 0.09%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1163      +/-   ##
==========================================
- Coverage   55.71%   55.61%   -0.10%     
==========================================
  Files         481      481              
  Lines       29387    29434      +47     
==========================================
- Hits        16373    16370       -3     
- Misses      10781    10830      +49     
- Partials     2233     2234       +1     
Flag Coverage Δ
unittests 55.61% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
network/p2p/middleware.go 0.00% <0.00%> (ø)
network/p2p/readConnection.go 0.00% <0.00%> (ø)
network/p2p/readSubscription.go 0.00% <0.00%> (ø)
module/mempool/epochs/transactions.go 94.73% <0.00%> (-5.27%) ⬇️
...sus/approvals/assignment_collector_statemachine.go 50.00% <0.00%> (-1.93%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73d6864...3d91f99. Read the comment docs.

@huitseeker huitseeker force-pushed the huitseeker/1115-originID-check branch from 5572a95 to 4cd3b85 Compare August 18, 2021 20:12
@huitseeker huitseeker marked this pull request as ready for review August 18, 2021 20:38
@huitseeker huitseeker requested a review from synzhu August 18, 2021 20:39
Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a comment

network/p2p/middleware.go Outdated Show resolved Hide resolved
Copy link
Contributor

@vishalchangrani vishalchangrani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments, otherwise LGTM!


originIdentity, found := identities[originID]
if !found || !originIdentity.NetworkPubKey.Equals(originKey) {
m.log.Warn().Msgf("message claiming to be from nodeID %v with key %x was actually signed by %x and dropped", originID, originIdentity.NetworkPubKey, originKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we may want to distinguish the error messages for the two cases above.

In particular, if !found is true, in the worst case you may get a segfault by trying to do originIdentity.NetworkPubKey on this line.

In the best case, originIdentity.NetworkPubKey is probably null and the error log is confusing for the reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the segfault is possible given || shortcuts: https://play.golang.org/p/g7CGIVG7dQ2

Copy link
Contributor

@synzhu synzhu Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that even in the log message itself, you are using originIdentity.NetworkPubkey because you print it out:

m.log.Warn().Msgf("message claiming to be from nodeID %v with key %x was actually signed by %x and dropped", originID, originIdentity.NetworkPubKey, originKey)

network/p2p/readSubscription.go Outdated Show resolved Hide resolved
Comment on lines +287 to +297
for i := 0; i < m.size; i++ {
select {
case <-m.obs:
case <-time.After(2 * time.Second):
assert.FailNow(m.T(), "could not receive pubsub tag indicating mesh formed")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain when a pubsub tag is emmitted? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andy why getting m.size of them indicates that a mesh has formed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Libp2p emits Protect calls with tag arguments marked by their pubsub topic when establishing each peering in the mesh:
https://github.com/libp2p/go-libp2p-pubsub/blob/master/tag_tracer.go

You would usually wait for libp2p.GossipSubD such links, but here there's only two nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand this a bit more.

From what I understand, for each connection both peers would get a Protect call for each peering, right?

So if there are 5 peers in total, then we should expect to see 8 tags before we can be confident that a mesh has formed, correct? (5 peers, 5 - 1 = 4 edges required for a connected graph, 4 * 2 = 8 tags).

Here, you are only checking for 5 tags.

Maybe I'm not understanding how this works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that Protect is only called once for each peer when establishing a peering? In otherwise, is counting the number of tags enough, or would it make sense to actually keep track of the peer IDs in case we get multiple tags for the same peer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here's where the calls to Protect are intercepted.

This is where they originate at the libp2p level, in reaction to a join.

I don't think you're incorrect in your analysis. However:

  • when I said "You would usually wait for libp2p.GossipSubD such links" I misremembered, we actually wait for pubsub.GossipSubD*count in tests with count = 10
  • the way I got to that number is by running the test many times at a log level that allowed me to witness the number of tags emitted,
  • indeed, the value there is meant as a fast & loose heuristic to make sure we wait until after the flow/networking level that triggers the UpdatePeers and then the libp2p mesh update downstream of that, NOT as a countdown to full mesh,
  • it was nowhere near the theoretical number we would obtain: if, when a node joins a topic, we get 2m graft messages when there are k nodes already on it (m = k if k < D, m = D otherwise, since we don't count pruning), then for 10 nodes we'd expect 90 messages, and we only witness around 65-ish,
  • we don't need to finish mesh formation to ensure message delivery, as you noticed we only need a connected graph.

I'm happy to update to a value you'd find more suitable, but for count = 2 (in the test we're commenting on), I note we have, practically and theoretically, 2 graft messages, every time.

are we sure that Protect is only called once for each peer when establishing a peering?

Yep, the Protect calls otherwise made by our engines are all tagless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation :)

network/test/middleware_test.go Outdated Show resolved Hide resolved
network/test/middleware_test.go Outdated Show resolved Hide resolved
@huitseeker huitseeker force-pushed the huitseeker/1115-originID-check branch 5 times, most recently from fe7fc04 to 18cf175 Compare August 19, 2021 15:09
@huitseeker
Copy link
Contributor Author

bors merge

bors bot added a commit that referenced this pull request Aug 19, 2021
1138: skip connecting to invalid identities in the identity table and log error instead of fatal r=vishalchangrani a=vishalchangrani

Found one more place where it was `log.fatal` on a bad identity.


1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker

Contributes to #1115.
Follow-up of #1129.

Edit: there was no problem with the integration tests, 🤦 

However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se.

Co-authored-by: Vishal <1117327+vishalchangrani@users.noreply.github.com>
Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
@bors
Copy link
Contributor

bors bot commented Aug 19, 2021

Build failed (retrying...):

bors bot added a commit that referenced this pull request Aug 19, 2021
1028: adding secure-rpc-addr to access node systemd file r=vishalchangrani a=vishalchangrani



1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker

Contributes to #1115.
Follow-up of #1129.

Edit: there was no problem with the integration tests, 🤦 

However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se.

Co-authored-by: vishal <1117327+vishalchangrani@users.noreply.github.com>
Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com>
Co-authored-by: Vishal <1117327+vishalchangrani@users.noreply.github.com>
Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
@huitseeker huitseeker force-pushed the huitseeker/1115-originID-check branch from 18cf175 to 1e69998 Compare August 19, 2021 17:15
@bors
Copy link
Contributor

bors bot commented Aug 19, 2021

Canceled.

@huitseeker
Copy link
Contributor Author

bors merge

@@ -360,6 +361,32 @@ func (m *Middleware) Unsubscribe(channel network.Channel) error {
return nil
}

// processAuthenticatedMessage processes a message and a source (indicated by its PublicKey) and eventually passes it to the overlay
func (m *Middleware) processAuthenticatedMessage(msg *message.Message, originKey crypto.PublicKey) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found still a bit confusing.

It sounds like the message has been authenticated, but actually we are going to check the authentication, right?

If so, would checkOriginAndProcess be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the message has been authenticated,

It has, the message has a clear network origin, which is the signer of its envelope, which signature has been checked.

we are going to check the authentication

No, we are going to check the protocol-level authorship claim against the prior network-level authentication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I see. would be great to add these knowledges into comments

bors bot added a commit that referenced this pull request Aug 19, 2021
1028: adding secure-rpc-addr to access node systemd file r=vishalchangrani a=vishalchangrani



1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker

Contributes to #1115.
Follow-up of #1129.

Edit: there was no problem with the integration tests, 🤦 

However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se.

Co-authored-by: vishal <1117327+vishalchangrani@users.noreply.github.com>
Co-authored-by: Simon Zhu <simon.zsiyan@gmail.com>
Co-authored-by: Vishal <1117327+vishalchangrani@users.noreply.github.com>
Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
@bors
Copy link
Contributor

bors bot commented Aug 19, 2021

Build failed (retrying...):

bors bot added a commit that referenced this pull request Aug 19, 2021
1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker

Contributes to #1115.
Follow-up of #1129.

Edit: there was no problem with the integration tests, 🤦 

However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se.

Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
@bors
Copy link
Contributor

bors bot commented Aug 19, 2021

Build failed:

@huitseeker huitseeker force-pushed the huitseeker/1115-originID-check branch from 1e69998 to f15333d Compare August 19, 2021 19:14
@huitseeker
Copy link
Contributor Author

bors retry

bors bot added a commit that referenced this pull request Aug 19, 2021
1163: [network] Check the OriginID of a libp2p message corresponds to its authenticated source r=huitseeker a=huitseeker

Contributes to #1115.
Follow-up of #1129.

Edit: there was no problem with the integration tests, 🤦 

However, TestMiddlewareTestSuit/TestUnsubscribe is still flaky based on the delay in the last step (sending the message once a mesh has formed). But it is problematic on master, see [[1]](https://github.com/onflow/flow-go/runs/3357056741) [[2]](https://github.com/onflow/flow-go/runs/3356508175), so this is not relevant to this PR per se.

Co-authored-by: François Garillot <francois.garillot@dapperlabs.com>
@bors
Copy link
Contributor

bors bot commented Aug 19, 2021

Build failed:

[network] Check the originID in sendDirect too
@huitseeker huitseeker merged commit 6caf5d3 into master Aug 20, 2021
@huitseeker huitseeker deleted the huitseeker/1115-originID-check branch August 20, 2021 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants