Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MerkleProof: Intermediate nodes can be reinterpreted as leaves #3091

Closed
hoytech opened this issue Jan 7, 2022 · 2 comments · Fixed by #3090
Closed

MerkleProof: Intermediate nodes can be reinterpreted as leaves #3091

hoytech opened this issue Jan 7, 2022 · 2 comments · Fixed by #3090

Comments

@hoytech
Copy link
Contributor

hoytech commented Jan 7, 2022

It is possible to prove the presence of certain values that aren't actually leaves in the tree.

This is a very slight variant of a well-known problem sometimes called a "second preimage attack" on merkle trees, although in my opinion that is a misleading name. The issue is that you can concatenate the hashes of two internal nodes (resulting in 64 bytes) and provide this as a value to be proved. The smart contract will hash this and attempt to verify it as though it was a normal leaf. If the proof has been shortened appropriately, it will incorrectly be validated.

The very slight variation from the typical description is that in the OZ implementation internal nodes are sorted, although this doesn't change the impact of this issue (which is low in most cases).

I included a test case in #3090 that demonstrates this issue and adds some documentation warning developers about how to avoid it.

Impact

The real world impact of this bug is likely small. The first condition for a successful attack is that a contract accepts legitimate leaf values with a length of exactly 64 bytes. For instance, Uniswap's merkle distributor uses the following to hash a leaf:

  bytes32 node = keccak256(abi.encodePacked(index, account, amount));

This is an example of code that is not vulnerable. index and amount are uint256s and account is an address so the value passed to keccak256 is 32 + 20 + 32 = 84 bytes long.

On the other hand, suppose the code was written like so:

  bytes32 node = keccak256(abi.encodePacked(index, account, uint96(amount)));

In this case, the leaf value is 32 + 20 + 12 = 64 bytes long so it would potentially be vulnerable to a griefing/spamming attack where a large amount of tokens are sent to some random address, effectively burning them. Alternatively, if index was packed into some small type (maybe uint32) then it might be possible to burn innocent users' tokens by sending them to random addresses. The length condition could also be satisfied if variable-length values are used, for example: abi.encodePacked(address, bytes(...)).

The second condition of a successful attack is to overcome the fact that the attacker has very little control over the invalid values. This is because they are the output of hash functions over data that presumably the attacker cannot influence. If attackers do have some influence over values in the tree (maybe they can sign up for an airdrop using arbitrary addresses) then they may be able to "grind" advantageous values to some degree.

However, in the typical case an attacker has to make do with a set of effectively random values. The number of usable inputs scales approximately with the number of leaf values in the tree. An attacker can loop over all possibilities and pick the best one, and there will be more chances of a "good" pair of hashes if the tree is larger. However, this is obviously dominated by the large output range of the 256-bit hash function so is not likely a problem in practice.

Recommendation

I originally submitted this to Immunefi but they weren't interested so I'm just posting here. It may be worthwhile determining if any contracts have been deployed that could be impacted by this (probably few, if any).

At the very least, users should be warned in the documentation about this issue. Contracts should ensure that there is a "domain separation" between hashing leaf values and intermediate nodes. This can be done either by making sure leaf values are never 64 bytes in length, or by using a hash function other than keccak256 when hashing leaves. Also, abi.encode() should be encouraged over abi.encodePacked() for several reasons.

My pull request contains some suggested wording for this. Also feel free to include any part of this description in your documentation, if it helps.

There are some API-level changes that could be made to definitively prevent this. For instance, verify() could accept a user-defined value type (https://blog.soliditylang.org/2021/09/27/user-defined-value-types/) that wraps bytes32 and can only be created by a keccak256-wrapper that also checks the input length is not 64-bytes. That said, I think leaving the responsibility of preventing this to the developer is fine, as long as there is appropriate documentation describing how to do so.

References

@Amxx
Copy link
Collaborator

Amxx commented Jan 7, 2022

Hello @hoytech,

We will come back to you with a longer response next week, but this is something we are aware of, and we already discussed. AFAIK this is not an issue with our implementation but rather a property of merkle trees.

It is true that users must be careful with what they use as leaves. IMO it's usually a hash that doesn't match the merkle construction mechanism, but you are right that there is a risk of confusion. We were particularly concerned with this risk when discussing rebuilding a leaf index during verification.

Changing the verify() parameters would be a breaking change, and we want to avoid that whenever possible. That being said, I agree we could definitely mention this potential issue in our documentation.

@hoytech
Copy link
Contributor Author

hoytech commented Jan 8, 2022

Hi @Amxx, thanks for the reply.

I don't believe it's really a property of merkle trees in general. Most merkle tree implementations prevent this by construction, by ensuring the input domains for constructing the leaf and node hashes are disjoint. They do this by prefixing/suffixing node type tags, using keyed hash functions, etc. In this way users can use any values at all for leaves.

This is not an issue with the OZ implementation in the sense that the OZ implementation is incomplete and does not itself implement leaf hashing. Instead, it requires the user of the library to take precautions to prevent it. Fortunately it's easy to do by ensuring that the only allowed leaf values are fixed, non-64 byte lengths, and perhaps you're right that usually this is the case, but there are many not-unrealistic scenarios where this could be violated (I wrote a brief analysis of this in the issue description).

Again, because most merkle tree implementations do not have this problem, it's especially important to notify library users, and it probably should have been done as soon as OZ was aware of it. My pull request has some basic wording that at least may help as a starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants