New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection Slot Exhaustion with Passive Nodes #29329
Comments
If an honest geth node connects to a malicious node and attempts to sync with it, it will be dropped as the malicious node couldn't provide the correct data, referring to this code section. |
I am also looking into this problem. see #29327 |
@weiihann Dear weiihann, the situation you described is partially correct, but there are two scenarios where Passive nodes will still exist in large numbers.
|
I'm curious why hasn't this occur a long time ago, if the attack cost is relatively low. Btw, just in case you missed it, I'd suggest to submit this to the bug bounty program. |
It was already submitted to the bug bounty program, and then got sent here. |
This problem may be first proposed around 2014 or earlier. However, as mentioned in #29327 (comment) and #29327 (comment) and #29034 (comment) by karalabe, the early developer encountered a very severe error where each peer rejects each other due to similar "improvement". Since that, they try not to make changes and hold the idea it is acceptable. |
Thanks for sharing! |
The attack is mostly a theoretical one. However, it would still be nice to fix it. @lyciumlee if you have a fix in mind, I am open to discuss it! |
@weiihann Dear weiihann, I have already submitted the report to the Ethereum Bug Bounty Program before disclosing it to the community, and I have email correspondence records. Fredrik Svantes suggested that I post the report here to discuss the issue with everyone. |
@fjl Dear fjl, I am primarily dedicated to the research of blockchain network protocols and have a keen interest in this area. I am also very willing to participate in the activities of building the Ethereum community. To address this issue, a score-based Peer mechanism can be used. The reasons are as follows:
For the first reason, even if some nodes forget to upgrade, this mechanism will not affect the non-upgraded nodes, nor will it cause the network to be segmented or forked. Because this mechanism is entirely a passive behavior marking mechanism, an honest node will transfer and broadcast many messages to the network through the ETH Wire Protocol and protocols based on the ETH Wire Protocol. Therefore, the upgraded nodes know that these non-upgraded nodes are honest nodes, which will not lead to the network being split. For the second reason, Execution Layer Nodes primarily use the ETH Wire Protocol for broadcasting Transactions-related messages, meaning when a new Transaction enters the Mempool of an honest node, it notifies other nodes that are unaware of the transaction. Therefore, upon receiving the message, upgraded nodes under the score-based Peer mechanism will recognize that the node is an honest node. Nodes that only receive messages without broadcasting any are actually harmful to the network. Although they do nothing but accept all the network messages they receive, they occupy connection slots of other nodes. The score-based mechanism can identify those that haven't generated any beneficial ETH Wire Protocol messages for a long time. If these malicious nodes are forced to participate in the message forwarding process, we also achieve the purpose of preventing this malicious behavior. |
One problem with your solution suggestion is the existence of syncing nodes. While the node is syncing, it cannot relay transactions. |
@fjl Dear fjl, I understand the sync process you mentioned. |
I'm a bit scared that if we build a peer scoring system, we're entering a game which has no end. Every 6 months, a new whitepaper will be presented on how some researchers bypassed geth's peer scoring system. And at some point, a scoring-system will unintentionally cut off syncing nodes, or client X (besu / nethermind), or clients with limited tx pool capacity, or something else. |
@holiman Dear holiman . I understand the concerns of the community. It seems we need to find a balance between simplicity and security. |
haha, such an interesting insight. Many companies have the motivation and incentive to bypass geth's peer scoring system. I might even secure a job with a higher salary, as they would be willing to pay for expertise in countering the scoring system in the future😃. |
I think we should just add a system where we disconnect a random peer every so often. It doesn't need any score/rules. |
But the peer can/would just retry the connection immediately upon disconnection? |
The reason we haven't added a reputation system or an algorithmic disconnect is because they are too easy to game. Since Geth's code is public, it's trivial to see what the rules are and how to fake them. What you'll end up with is non-zero probability of unintended side effects dropping legitimate peers and close-zero probability of actual effect on malicious peers who just fake some traffic. As @holiman mentioned, it becomes a game of whack-a-mole. Whilst I agree that your concern is legitimate, IMO it's very hard to find a solution to a non-existing problem (as in not-actively exploited), because we just don't know how the problem would look like and what the actual solution would be. Fixing every possible attack scenario anyone can ever dream up in the future is a questionable effort (whilst noble). An alternative line of thought is what the probabilities and gains are for such an attack. Currently block propagation is handled by consensus nodes, so to filling connection slots doens't really block the network from functioning. Transaction propagation can be impacted, but there are many MEV private pools that could be injected into directly (which many do already), so it's not obvious what the gain would be to block on mechanism whilst the other is still going strong. My 2 cents are that we need resilience more than fool-proof-ness. For example, for the discovery protocol, we have two mechanism: the DHT itself and the DNS discovery. Both could in theory be attacked, but doing it simultaneously is probably non-trivial and would be quickly detected. We've added 2 to have each be a backup/fallback in case the other has some issues. Sure, we want to make both as robust as meaningful, but neither needs to be absolute perfect resilient. For the transactions, we again have the two mechanisms (txpools, mev pools) that act as one-another's backup. Of course they are not serving the same purpose, but they do provide resiliency. The sync code at some point was quite agressive with dropping "useless" peers, so it should be kind of hard to eclipse syncing nodes off from the network - at least as much as EL is concerned. IMO it is more valuable to have a robust monitoring to detect anomalous behavior and course correct (on top of the resilient mechanisms) rather then to cover all possible bases all the time, investing infinite resources. |
Thank you all for your opinions, and thanks to the Ethereum developers for their enthusiastic answers. I have benefited immensely from everyone's responses. |
System information
Geth version:
geth version
1.13.14CL client & version: e.g. lighthouse/nimbus/prysm@v1.0.0
OS & Version: Windows/Linux/OSX
Commit hash : (if
develop
)This issue has been reported to Fredrik Svantes, and Fredrik Svantes suggested that I open an issue here. @fredriksvantes
Short description:
The ETH Wire Protocol lack of a mechanism for periodically disconnecting passive nodes, i.e., nodes receive messages but do not disseminate them to others, allowing an attacker to exhaust all public nodes’ inbound connection slots with little IP and bandwidth resources, and thus preventing new nodes from joining the network.
Attack scenario:
Connection exhaustion attacks have a long history in p2p systems. This attack is cheaper in Ethereum than in most other p2p networks as the Ethereum network does not evict passive nodes. These nodes are operated with extremely low costs as they do not disseminate blocks and transaction messages to the others. Neither do they download historical blockchain data when joining the network.
The attacker deploys dozens of modified Geth nodes in the Ethereum network, which differs from ordinary Geth nodes in the following two aspects. First, these nodes have their outbound connection limit removed, and are constantly trying to establish connections with all known reachable Ethereum nodes. Optionally, these nodes have a large limit on inbound connections, allowing them to accept as many connections from newly-joined nodes as possible. Second, they do not store or propagate any blockchain data. This significantly lowers the storage and bandwidth cost of the attack.
By the current protocol, these nodes are considered benign by the network. Once they establish the ETH Wire Protocol handshake with honest nodes, they only receive ETH Wire Protocol messages without actively sending any themselves. This way, the attacker continually occupied the honest nodes’ connection slots, preventing new nodes from joining the network.
The process described above constitutes a low-cost DoS attack. In an ideal scenario, an attacker would only need the computational resources of 34 nodes—the number of inbound connection slots—to attack the entire Ethereum Mainnet network.
The root cause of this attack is due to the Wire Protocol not establishing a challenge-response and reputation mechanism to verify whether peers in the Eth Wire Protocol are honest or passive nodes.
Impact:
Due to all inbound connections being occupied by attacking nodes, new nodes are unable to join the Ethereum Mainnet, and nodes that have dropped off cannot rejoin the network.
Components:
The Peer module under the eth protocol in Geth does not differentiate between active and passive nodes, so it is impossible to determine which nodes are good and which are bad.
After ETH 2.0, execution clients are only responsible for relaying transaction-related messages to each other, as can be seen in the source code file eth/protocols/eth/peer.go. The broadcastTransactions and announceTransactions methods of the peer struct are responsible for the forwarding and handling of new transactions.
It can be observed that within these two functions, and in the corresponding functions that handle them—handleNewPooledTransactionHashes, handleTransactions, and handlePooledTransactions—there are no checks performed to determine if the peer nodes are active.
Reproduction:
To implement an attacking node that does not forward any transaction messages, it is only necessary to remove the call functions for transactions forwarding in broadcastTransactions and announceTransactions.
The following code is what we wish for the attacker to modify.
eth/handler.go
eth/protocol/eth/broadcast.go
The following code is an optimization for the exp to ensure it reaches ideal conditions.
We need to modify the functions func (d *downloader.Downloader) RegisterPeer(id string, version uint, peer downloader.Peer) and func (s *snap.Syncer) Register(peer snap.SyncPeer) so that they directly return nil. These two functions are used to register services related to blockchain synchronization protocols.
According to data from ethernodes.org, there are approximately 7000 nodes in the entire network, so we need to set MaxPeers in node/defaults.go of defaults.go to 7000 * 3 = 21000.
Fix:
To fix this security vulnerability, 1. we can add a response reputation variable to the peer structure. Each time a message from the ETH Wire Protocol is received from a node, its score is increased. Then, for example, every 5 minutes, a certain amount of points will be deducted periodically. If within an hour, the score of a node drops below a certain threshold, then the node will be marked as malicious.
Or,
2. In the implementation of the ETH Wire Protocol, nodes do not implement challenge-response among honest nodes. Challenge-response refers to honest nodes randomly requesting known messages from their peers to detect if they are actively responsive nodes.
The text was updated successfully, but these errors were encountered: