Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnect invalid and inactive peers #431

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

jmozah
Copy link

@jmozah jmozah commented Feb 27, 2023

This PR adds checks to identify and ban peers that pass the P2P handshake and are accepted into the application protocol but has other application-level issues.

  • Some clients have valid caps (opera/63, fsnap/1) but invalid client names such as Efireal, go-corex, Geth etc.
  • Progress message is checked if the Epoch increases for a nominal duration.
  • Application message should be received within a threshold.
  • Recurring Application error now results in banning the peer.

These checks have shown that peers that are valid and working honestly get priority.

Depends on Fantom-foundation/go-ethereum#44

@jmozah jmozah self-assigned this Feb 27, 2023
@jmozah jmozah marked this pull request as ready for review March 16, 2023 11:15
@jmozah jmozah requested a review from uprendis March 16, 2023 11:16
useless = true

// Some clients have compatible caps and thus pass discovery checks and seep in to
// protocol handler. We should band these clients immediately.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: little typo

@@ -61,6 +61,17 @@ const (
// txChanSize is the size of channel listening to NewTxsNotify.
// The number is referenced from the size of tx pool.
txChanSize = 4096

// percentage of useless peer nodes to allow
uselessPeerPercentage = 20 // 20%

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why don't we use just a factor, e.g. 0.2, instead of then having to calculate each time the percentage?


// A useless peer is the one which does not support protocols opera/63 & fsnap/1.
useless := !eligibleForSnap(p.Peer)
if !p.Peer.Info().Network.Trusted && useless && h.peers.UselessNum() >= (h.maxPeers*(uselessPeerPercentage/100)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I am not yet familiar with this useless stuff, but why do we even allow a percentage of useless peers at all? Why don't we just disconnect them all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the peer is useless in the context of sync, i.e. it doesn't support fsnap/1 and opera/63.
But old peers supporting opera/62 should still be allowed to participate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so I assume useless then already checked that the peer is a opera/62 peer. It's not just any peer. That would make sense.

return err
// progress and application
progressWatchDogTimer := time.NewTimer(noProgressTime)
applicationWatchDogTimer := time.NewTimer(noAppMessageTime)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we recreating the timer on each for iteration here? Therefore the Resets later are useless? It looks to me that either we have to create the timers outside of the for loop, and then Reset them as you do now, or recreating them in each loop iteration and just break when we Reset, although this then results in a lot of garbage collected timers? Or am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... the timer should be outside the loop.

err := h.handleMsg(p)
if err != nil {
p.Log().Debug("Message handling failed", "err", err)
if strings.Contains(err.Error(), errorToString[ErrPeerNotProgressing]) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use errors.Is here instead of comparing strings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use errors.Is() only to compare errors. But in this place, the error is defined as a string.
If we want to change it, we should define all the errors as errors.New().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed. If there are more such string based errors instead of errors.New() based ones (which I believe would be better) - then this should go into a separate PR to address. So up to you if you want to do anything in this PR.

@@ -1014,6 +1070,10 @@ func (h *handler) handleMsg(p *peer) error {
return errResp(ErrDecode, "%v: %v", msg, err)
}
p.SetProgress(progress)
// If peer has not progressed for noProgressTime minutes, then disconnect the peer.
if !p.IsPeerProgressing() {
return errResp(ErrPeerNotProgressing, "%v: %v %v", "epoch is not progressing for ", noProgressTime, "minutes")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: As noProgressTime is a duration, this would print "epoch is not progressing for 3m0s minutes", I think

@@ -1316,6 +1376,11 @@ func (h *handler) handleMsg(p *peer) error {
default:
return errResp(ErrInvalidMsgCode, "%v", msg.Code)
}

if msg.Code != ProgressMsg {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not yet familiar with all message codes, but is ProgressMsg the only message which signals that there is progress?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

p.progress = x
p.progressTime = time.Now()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why p.appMessageTime is locked, but p.progressTime isn't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's locked in SetProgress() where setPeerAsProgressing() is called.

newPeer := getPeer()
ep1 := PeerProgress{Epoch: 1}
newPeer.SetProgress(ep1)
time.Sleep(2 * time.Second) //set the threshold to 2 second

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these Sleep acctumulate to 9 seconds - making test runs 9 seconds slower as I understand. Isn't there a different way to test this? Do we actually even need to sleep?

Copy link

@holisticode holisticode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I should already be the only person approving, but I want to signal that this looks good to me now (at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants