Warn about hashmap randomization and propose solutions #35

joonazan · 2023-07-27T15:34:10Z

Almost all Rust code uses std::collections::HashMap with the default hasher. This makes instrumentation almost useless.

For example, one of my targets didn't even manage to produce a valid input after days. After I disabled Hashmap randomization, fuzzing even found a crash in a few hours.

Ways to solve this

My solution: edit the standard library. Cross-platform, but not easy to package.

LLVM solution: do not instrument hash maps. It is unclear to me how to do this because the instructions are C++ specific.

Linux solution: define your own getrandom function.

I also recommend AFL++ first, because it tracks whether the same input produces the same trace and warns about it.

For a more wordy explanation, see this post: https://internals.rust-lang.org/t/support-turning-off-hashmap-randomness/19234

The text was updated successfully, but these errors were encountered:

fitzgen · 2023-07-27T16:13:48Z

I think this is missing a bit of context.

Fuzz targets should generally be deterministic, so if you are using hash maps, you shouldn't generally be iterating over them in a way that makes the randomness observable in the fuzz target.

Given a deterministic fuzz target that internally uses a hash map for whatever reason, I don't see how "instrumentation is almost useless". But maybe I am missing something?

joonazan · 2023-07-27T21:43:46Z

@fitzgen Even code where you get/set hashmap fields is unstable. Among other things, hash collisions happen at different times depending on the randomization and that is recorded as a different path through the program.

Given a deterministic fuzz target that internally uses a hash map for whatever reason, I don't see how "instrumentation is almost useless".

The fuzzer changes the input an an inconsequential way. Because of differently seeded hashmaps, the program trace is not the same, so the fuzzer concludes that it found a new interesting input. Actual new paths are very rare, so they are drowned out by the fake new paths that happen say 40% of the time.

kornelski · 2023-07-31T00:49:13Z

@fitzgen there are fuzzers that use program instrumentation to discover "new" paths to explore by observing which branches were taken each time. Randomness in the hashmap makes it execute a different pattern of branching each time (due to different collisions that change how long search for the right bucket takes). This makes fuzzer assume it's discovering novel, previously untested code, while in reality it's just exercising the same non-deterministic hashmaps over and over.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn about hashmap randomization and propose solutions #35

Warn about hashmap randomization and propose solutions #35

joonazan commented Jul 27, 2023

fitzgen commented Jul 27, 2023

joonazan commented Jul 27, 2023

kornelski commented Jul 31, 2023 •

edited

Warn about hashmap randomization and propose solutions #35

Warn about hashmap randomization and propose solutions #35

Comments

joonazan commented Jul 27, 2023

Ways to solve this

fitzgen commented Jul 27, 2023

joonazan commented Jul 27, 2023

kornelski commented Jul 31, 2023 • edited

kornelski commented Jul 31, 2023 •

edited