Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a libFuzzer fuzzing harness #744

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

elichai
Copy link
Contributor

@elichai elichai commented Apr 18, 2020

Hi,
As a result of #739, I wrote a fuzzing harness so people can fuzz the library themselves if they want so.

This is my first time playing with libFuzzer so any feedback/criticism is very welcome.

For the autotools part it's mostly copied from bitcoin core with the caveat of automatically adding -fsanitize=fuzzer, because this is only libFuzzer so and that's required to run it anyway.

Currently it's fuzzing only the public API of the library, but in the future we can add fuzzing for internal functions, which is why I created a fuzz directory.

One thing I did that I know isn't quite standard is the Fuzz Garbage through parsing functions part, where I loop over the input byte by byte and feed it into the parsing functions, I think this is a good way to throw a lot of data into these functions and make sure that: A. they don't crash, B. they return only 1 or 0.

cc @practicalswift

@elichai
Copy link
Contributor Author

elichai commented Apr 19, 2020

Maybe I should add more "rainy day" scenarios? (ie if I can decode a pubkey+signature from the input, try to verify it and check that it returns 0)

@practicalswift
Copy link
Contributor

@elichai

This is an excellent initiative! Thanks a lot for doing this! ❤️

Strongest possible concept ACK from a fellow fuzzing enthusiast :)

Yesterday I setup a very basic fuzzing infrastructure which does continuous fuzzing of master using the code in this PR. It has been running for a few CPU hours by now and it seems to be chugging along fine. I hope to be able ramp up the number of cores dedicated to this and keep it running permanently (with periodic automatic re-builds against master) :)

I'd be happy to share all coverage increasing inputs I'm able to find. (In the extremely unlikely event that I find a crashing input (crashing with or without sanitizers enabled) it will be responsibly reported to the maintainers like I've done in the past when I've found issues in Bitcoin Core or any of its dependencies (CVE-2017-18350, CVE-2018-20586, CVE-2019-18936, etc.). I believe in responsible disclosure and the policy I'm following myself is that the project maintainer decides when to go public with any findings and ideally that the maintainer handles all such public disclosure to avoid having the researcher accidentally disclosing more than the maintainer signed off for due to miscommunication.)

A few comments/suggestions regarding the implementation that I thought of when I tinkered with your code while setting up the continuous fuzzing:

  • The configure option semantics are not matching the ones in Bitcoin Core. In Bitcoin Core --enable-fuzz does not automatically add -fsanitize=fuzzer so if you wantlibFuzzer with say ASAN/UBSAN you'll have to do --enable-fuzz --with-sanitizers=address,fuzzer,undefined. The rationale being that you might want to use another fuzzer such as afl-fuzz or honggfuzz. What do you think about matching how it works in Bitcoin Core?
  • What do you think about going from the current single-binary-with-many-fuzzing-harnesses to the one-binary-per-independent-fuzzing-harness model that Bitcoin Core uses? My experience is that the latter model works better for modern coverage-guided fuzzers. See also the comment in Offering my Bitcoin fuzzers bitcoin/bitcoin#11045 (comment) from @Dor1s.
  • Regarding adding more "rainy day" scenarios: yes, please! :)

Again: @elichai - this is a great initative. Thanks!

@elichai
Copy link
Contributor Author

elichai commented Apr 19, 2020

Thanks :)

  • The configure option semantics are not matching the ones in Bitcoin Core. In Bitcoin Core --enable-fuzz does not automatically add -fsanitize=fuzzer so if you wantlibFuzzer with say ASAN/UBSAN you'll have to do --enable-fuzz --with-sanitizers=address,fuzzer,undefined. The rationale being that you might want to use another fuzzer such as afl-fuzz or honggfuzz. What do you think about matching how it works in Bitcoin Core?

I've done it like that because I only added support for libFuzzer right now, didn't want to add the complexity Bitcoin Core has for AFL because I wasn't sure if anyone was using it and it can be done in a future PR, but if it's a common setup I can definitely adapt.
(FYI, clang doesn't care if you specify a sanitizer multiple times so you can still run with --enable-fuzz --with-sanitizers=address,fuzzer,undefined)

  • What do you think about going from the current single-binary-with-many-fuzzing-harnesses to the one-binary-per-independent-fuzzing-harness model that Bitcoin Core uses? My experience is that the latter model works better for modern coverage-guided fuzzers. See also the comment in bitcoin/bitcoin#11045 (comment) from @Dor1s.

I also thought about it, right now we have different benchmarks binary per module, but one test binary for all modules.
I used the same one for all because they share common characteristics (need 32 bytes, need to verify a seckey etc.) but if there's a practical advantage to separate binaries I can do that.

P.S. what do you think about the parsing loops, are they a good idea or that's not really how the fuzzing is designed to be used? (I'm iterating byte by byte over the whole input trying it as input)

@practicalswift
Copy link
Contributor

@elichai

Have you checked what parts of the code your fuzzer is able to reach deeply in to, and what parts that are only shallowly fuzzed when starting from an empty seed corpus?

Due to the nature of the library there are quite a few natural "fuzzing road blocks" in the code base that will need to be handled to get good/deep coverage when doing coverage-based fuzzing starting from an empty corpus. John Regehr has a great piece on writing fuzzable code which also covers "fuzzer blockers".

I highly recommend Regehr's blog if you're interested in modern fuzzing and modern fuzzing research. Here are some good reads:

Happy fuzzing! :)

@elichai
Copy link
Contributor Author

elichai commented Apr 30, 2020

I'll read, thank you.

Have you checked what parts of the code your fuzzer is able to reach deeply in to, and what parts that are only shallowly fuzzed when starting from an empty seed corpus?

How do I check that? And it should be able to reach pretty deeply in the "happy path"(because a valid seckey is pretty probable, and then I sign, verify etc with it)

@practicalswift
Copy link
Contributor

practicalswift commented Apr 30, 2020

How do I check that?

This is the type of output you would like to be looking at when searching for opportunities to tweak your fuzzer to reach deeper:

https://marcofalke.github.io/btc_cov/fuzz.coverage/index.html

It shows the current fuzzing coverage of Bitcoin Core. It is currently at 50% line coverage but I'm far from done. I'm aiming for 1.) beating the unit tests with regards to coverage (short term), 2.) beating the functional tests with regards to coverage (medium term) and 3.) in the longer term the final goal is full coverage of course :)

See if you can create such a page with lcov output for your secp256k1 fuzzer. That would be really nice to see!

I would suggest first fuzzing normally using say:

$ mkdir -p thin-air-corpus/
$ ./fuzz_public -use_value_profile=1 -max_len=8192 thin-air-corpus/

Let it run until you get bored waiting for entries marked NEW in the libFuzzer output: in other words when the increases in the cov and ft columns are infrequent.

Stop the fuzzer.

Now go build a coverage instrumented build of fuzz_public which is not using -fsanitize=fuzzer.

Call the resulting binary fuzz_public_cov_without_libfuzzer.

Do ...

for INPUT in thin-air-corpus/*; do
    ./fuzz_public_cov_without_libfuzzer < $INPUT
done

... to collect coverage data.

Use lcov, gcov, gcovr or your favourite coverage analysis tool to see the achieved coverage :)

Tweak your fuzzing harness to achieve even better coverage.

Repeat the whole process.

Again.

And again.

And again until you've reached a level of coverage you're pleased with :)

And it should be able to reach pretty deeply in the "happy path"(because a valid seckey is pretty probable, and then I sign, verify etc with it)

All paths are good to fuzz, but generally speaking if I'd have to choose between fuzzing only "happy paths" or only non-"happy paths" to find vulnerabilities I'd go for non-happy paths (paths likely to invoke error handling, etc.). Those are usually relatively under-tested and thus the places where you are most likely to find interesting stuff :)

@practicalswift
Copy link
Contributor

practicalswift commented Apr 30, 2020

@elichai

I've now created a repo (https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus) where we can share coverage increasing inputs :)

I've added a first batch of the 2911 coverage increasing inputs I've found using a fuzzing farm that has been running non-stop for 12 days (calendar time, CPU time much much more of course).

Hope this first set of inputs can help you find opportunities where the fuzzing harness can be tweaked to reach deeper in to the code base and thus achieve better coverage :)

Really excited about your fuzzing work!

Quick start guide if anyone else wants to follow along on this fuzzing journey:

# check out https://github.com/bitcoin-core/secp256k1/pull/744
$ ./autogen.sh
$ CC=clang CXX=clang++ ./configure --enable-fuzz --with-asm=no --enable-experimental --enable-module-ecdh --enable-module-recovery
$ make
$ git clone https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus
$ mkdir -p coverage-increasing-inputs-to-submit-upstream/
$ ./fuzz_public -use_value_profile=1 coverage-increasing-inputs-to-submit-upstream/ libsecp256k1-fuzzing-seed-corpus/seeds/
INFO: Seed: 747237488
INFO: Loaded 2 modules   (717 inline 8-bit counters): 625 [0x7f66c8ed90c0, 0x7f66c8ed9331), 92 [0x6f9120, 0x6f917c),
INFO: Loaded 2 PC tables (717 PCs): 625 [0x7f66c8ed9338,0x7f66c8edba48), 92 [0x4d2bc0,0x4d3180),
INFO:        0 files found in coverage-increasing-inputs-to-submit-upstream/
INFO:     2911 files found in libsecp256k1-fuzzing-seed-corpus/seeds/
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 8191 bytes
INFO: seed corpus: files: 2911 min: 1b max: 8191b total: 259749b rss: 25Mb
#2048   pulse  cov: 425 ft: 13712 corp: 1712/54Kb exec/s: 682 rss: 26Mb
#2912   INITED cov: 445 ft: 14943 corp: 2533/239Kb exec/s: 582 rss: 27Mb
…
# Please submit any coverage increasing inputs to:
# * https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus

@elichai
Copy link
Contributor Author

elichai commented May 3, 2020

I've now created a repo (https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus) where we can share coverage increasing inputs :)

Thanks! This is awesome.
I think I'll reformat this into multiple seperate binaries per module as you and others have suggested, and then I'll manually add corpuses to increase coverage(and PR them into your repo), but it's already looking pretty good.

@practicalswift
Copy link
Contributor

@elichai Excellent! Let me know if I can help in any way :)

@elichai
Copy link
Contributor Author

elichai commented May 4, 2020

@elichai Excellent! Let me know if I can help in any way :)

There's one thing I'm trying to figure out
Is it ok to reuse the same fuzzing input on multiple functions or should I "pull" more data for each function I call? should I minimize the amount of fuzzing input I'm using or maximize it?

Right now I'm mostly reusing the same data for a lot of operations, and the whole "Fuzz Garbage through parsing functions" basically throws all the input into those functions.

But if I shouldn't use the same input for multiple functions(which if I look at FuzzedDataProvider https://github.com/llvm/llvm-project/blob/master/compiler-rt/include/fuzzer/FuzzedDataProvider.h it sure looks like it) then I should add an abstraction that keeps an index, and remove the loops that go over all the data.
And if I do that what should I do if I'm out of data? return 0;? loop to the beginning? something else?

(P.S. Are you on IRC or any chat platform?)

@practicalswift
Copy link
Contributor

practicalswift commented May 6, 2020

@elichai

Is it ok to reuse the same fuzzing input on multiple functions or should I "pull" more data for each function I call? should I minimize the amount of fuzzing input I'm using or maximize it?

Re-using the same parts of the input for independent parts of the fuzzing harness is likely to confuse the coverage-guided fuzzer and should be avoided if possible.

Example:

If you're fuzzing the f and g using input from fuzzer controlled stream s then …

f(s.read(10))
g(s.read(10))

… is much better than …

f(s.read(10))
s.reset()
g(s.read(10)) # re-use the same ten bytes used for f

… since the fuzzer in the first case can try to figure out the relation between s[0:10] vs coverage achieved in f, and s[10:20] vs coverage achieved in g independently.

Generally you want to make everything as dumb as possible for the coverage-guided fuzzer to work: no input re-use which may obscure the true interaction between input provided and coverage reached, and many small target binaries (testing a single unit of of functionality each) instead of one big target binary trying to cover all functionality.

In the example above then f and g should ideally be fuzzed in different binaries assuming they are unrelated/independent of each other.

Hope that helps :)

(P.S. Are you on IRC or any chat platform?)

I don't use IRC, but I'll send you an e-mail right away :)

@elichai
Copy link
Contributor Author

elichai commented May 7, 2020

I believe this is getting better.
Should I split fuzz_public into multiple binaries? (fuzz_ecdsa, fuzz_arithmetic, fuzz_sigs, fuzz_key_parsing)

@practicalswift
Copy link
Contributor

@elichai Yes, please split in multiple binaries: see the last point in these recommendations from the libFuzzer documentation :)

Some important things to remember about fuzz targets:

  • The fuzzing engine will execute the fuzz target many times with different inputs in the same process.
  • It must tolerate any kind of input (empty, huge, malformed, etc).
  • It must not exit() on any input.
  • It may use threads but ideally all threads should be joined at the end of the function.
  • It must be as deterministic as possible. Non-determinism (e.g. random decisions not based on the input bytes) will make fuzzing inefficient.
  • It must be fast. Try avoiding cubic or greater complexity, logging, or excessive memory consumption. Ideally, it should not modify any global state (although that’s not strict).
  • Usually, the narrower the target the better. E.g. if your target can parse several data formats, split it into several targets, one per format.

@practicalswift
Copy link
Contributor

Anyone wants to chime in with Concept ACK:s or Concept NACK:s?

(FWIW I'm super enthusiastic about your work @elichai, and if it doesn't end up in this repo I'd like to collaborate with you on this effort in another repo (say secp256k1-fuzz :))

@real-or-random
Copy link
Contributor

Anyone wants to chime in with Concept ACK:s or Concept NACK:s?

I rally don't know a lot about fuzzing so I doubt I'm qualified enough to judge this. What's the cost of maintaining this? Does it have advantages to keep this in this repo here (because it may be extended to the internal API) or can this live equally good in another repo?

@practicalswift
Copy link
Contributor

@elichai

I think a rebase is needed :)

Somewhat related to this PR:

@guidovranken is doing some really interesting differential fuzzing work which AFAIK haven't found any bugs in libsecp256k1, but in some other libraries like Trezor: trezor/trezor-firmware#1374.

Perhaps it could serve as inspiration for your fuzzing work @elichai, or you could join forces? :)

@elichai
Copy link
Contributor Author

elichai commented Nov 30, 2020

@guidovranken is doing some really interesting differential fuzzing work which AFAIK haven't found any bugs in libsecp256k1, but in some other libraries like Trezor: trezor/trezor-firmware#1374.

Perhaps it could serve as inspiration for your fuzzing work @elichai, or you could join forces? :)

Small note, I'm not sure it's that obvious that a zero message should pass ECDSA verification (see rust-bitcoin/rust-secp256k1#207)

but yeah I don't mind helping fuzzing / rebasing, and I should probably add schnorrsig coverage here at some point

@sipa
Copy link
Contributor

sipa commented May 9, 2023

Sorry for the late response, but Concept ACK.

While the general advice is indeed to have multiple fuzzing binaries for different tests (and specifically, not read the "which fuzz test to run" from the test input, as the fuzzing engine will try to mutate pointlessly between the tests), in Bitcoin Core we discovered this quickly becomes unwieldy when there are lots of tests. We instead settled for reading the test name from the environment (so, one binary, but it needs to be told through FUZZ="fuzz_test_name" which test to run). I think it'd make sense to do something similar here, as I can envision having many tests (I'll write ones myself too).

Going further than this, I think the most interesting types of test will be more low-level tests (field, scalar arithmetic, perhaps group operations), and higher-level tests for the exhaustive test field. Having this is a great start, though.

Rebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants