Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about hashmap randomization and propose solutions #35

Open
joonazan opened this issue Jul 27, 2023 · 3 comments
Open

Warn about hashmap randomization and propose solutions #35

joonazan opened this issue Jul 27, 2023 · 3 comments

Comments

@joonazan
Copy link

Almost all Rust code uses std::collections::HashMap with the default hasher. This makes instrumentation almost useless.

For example, one of my targets didn't even manage to produce a valid input after days. After I disabled Hashmap randomization, fuzzing even found a crash in a few hours.

Ways to solve this

My solution: edit the standard library. Cross-platform, but not easy to package.

LLVM solution: do not instrument hash maps. It is unclear to me how to do this because the instructions are C++ specific.

Linux solution: define your own getrandom function.

I also recommend AFL++ first, because it tracks whether the same input produces the same trace and warns about it.

For a more wordy explanation, see this post: https://internals.rust-lang.org/t/support-turning-off-hashmap-randomness/19234

@fitzgen
Copy link
Member

fitzgen commented Jul 27, 2023

I think this is missing a bit of context.

Fuzz targets should generally be deterministic, so if you are using hash maps, you shouldn't generally be iterating over them in a way that makes the randomness observable in the fuzz target.

Given a deterministic fuzz target that internally uses a hash map for whatever reason, I don't see how "instrumentation is almost useless". But maybe I am missing something?

@joonazan
Copy link
Author

@fitzgen Even code where you get/set hashmap fields is unstable. Among other things, hash collisions happen at different times depending on the randomization and that is recorded as a different path through the program.

Given a deterministic fuzz target that internally uses a hash map for whatever reason, I don't see how "instrumentation is almost useless".

The fuzzer changes the input an an inconsequential way. Because of differently seeded hashmaps, the program trace is not the same, so the fuzzer concludes that it found a new interesting input. Actual new paths are very rare, so they are drowned out by the fake new paths that happen say 40% of the time.

@kornelski
Copy link

kornelski commented Jul 31, 2023

@fitzgen there are fuzzers that use program instrumentation to discover "new" paths to explore by observing which branches were taken each time. Randomness in the hashmap makes it execute a different pattern of branching each time (due to different collisions that change how long search for the right bucket takes). This makes fuzzer assume it's discovering novel, previously untested code, while in reality it's just exercising the same non-deterministic hashmaps over and over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants