Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧪 Add OSS-Fuzz set up #255

Merged
merged 8 commits into from May 31, 2023

Conversation

DavidKorczynski
Copy link
Contributor

This is a follow-up to #254 (comment)

@chrisjsewell I added some documentation which may be helpful to navigate the OSS-Fuzz infra. I hope this should make it easier to e.g. experiment and test with the fuzzing.

I'm the original author of the fuzzers and am happy to license them according to the MIT of this repository.

@welcome
Copy link

welcome bot commented Mar 15, 2023

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out EBP's Code of Conduct and our Contributing Guide, as this will greatly help the review process.

Welcome to the EBP community! 🎉

@codecov
Copy link

codecov bot commented Mar 25, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (d1852a5) 96.07% compared to head (497fee8) 96.07%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #255   +/-   ##
=======================================
  Coverage   96.07%   96.07%           
=======================================
  Files          62       62           
  Lines        3236     3236           
=======================================
  Hits         3109     3109           
  Misses        127      127           
Flag Coverage Δ
pytests 96.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@chrisjsewell chrisjsewell changed the title Add OSS-Fuzz set up 🧪 Add OSS-Fuzz set up Mar 25, 2023
@chrisjsewell
Copy link
Member

Thanks @DavidKorczynski!
As you can see, I just added a workflow to get an idea of how it runs; obviously it looks like it takes too long to run for all PRs, and would so would just be run as a maybe weekly cron job

From the README.md, it is not quite clear what the difference is between running the files in tests/fuzz directly, e.g. python tests/fuzz/fuzz_markdown.py, and following the steps to run using the git clone https://github.com/google/oss-fuzz steps, e.g. python infra/helper.py run_fuzzer markdown-it-py fuzz_markdown
What is the difference?
Is the code in google/oss-fuzz more thorough?

For the actual code, I'd like to understand it a little more, the language support is a little lacking 😅:

image

  • Is there a simple way to explain what "instrumenting" actually does?
  • what can you actually pass with sys.argv, I tried doing things like --help and it didn't seem to actually do anything
  • can you restrict the fuzzing to only a certain module or something? or does it always have to run through everything?
    • is the way I run it in the workflow even right, or is it running through the whole of python's core code or something 😅

@chrisjsewell
Copy link
Member

so yeh, the fuzzing hit the 6 hours time limit, is that right!?

@DavidKorczynski
Copy link
Contributor Author

so yeh, the fuzzing hit the 6 hours time limit, is that right!?

When run as follows python3 ./fuzz_markdown.py the fuzzer will run forever or until a crash is found.

You can limit the fuzzer time using -max_total_time=X where X is the number of seconds to run. python3 ./fuzz_markdown.py -max_total_time=10 runs the fuzzer for 10 seconds where the timer starts post instrumentation.

what can you actually pass with sys.argv, I tried doing things like --help and it didn't seem to actually do anything

atheris uses libFuzzer under the hood, which means you can pass the commandline flags specified here: https://llvm.org/docs/LibFuzzer.html

Is there a simple way to explain what "instrumenting" actually does?

The instrumentation adds code that essentially correspond to counters throughout the Python code. These counters will be used by the fuzzer to trace what code was executed by a given input, and further uses this to check whether a input executes unique parts of markdown-it. This is needed because the fuzzer will find unique inputs that trigger unique parts of the code. Fuzzing is a "genetic mutational algorithm" and the instrumentation is used to separate which which seeds to save, I talk a small bit about here https://youtu.be/zIyIZxAZLzo?t=368 and from [06:08 - 10:30] approximately.

From the README.md, it is not quite clear what the difference is between running the files in tests/fuzz directly, e.g. python tests/fuzz/fuzz_markdown.py, and following the steps to run using the git clone https://github.com/google/oss-fuzz steps, e.g. python infra/helper.py run_fuzzer markdown-it-py fuzz_markdown
What is the difference?
Is the code in google/oss-fuzz more thorough?

No difference as such. It's the same program.

OSS-Fuzz will, however, wrap the code in a package using pyinstaller and offers some additional things e.g. code coverage visualisation to check how much of the target code is executed.

I would not necessarily recommend setting up a cron-job in the CI, as OSS-Fuzz will make sure the fuzzers run continuously. If anything, I'd advice to set it up in a way that tests if the fuzzers work (i.e. run for a short period of time) or use the CIFuzz job that comes accompanied with OSS-Fuzz: https://google.github.io/oss-fuzz/getting-started/continuous-integration/#integrating-into-your-repository

@chrisjsewell
Copy link
Member

Thanks for the detailed response!

ERROR: 100.0% of fuzz targets seem to be broken. See the list above for a detailed information: https://github.com/executablebooks/markdown-it-py/actions/runs/4543755476/jobs/8008855531?pr=255

I don't think that I've quite mastered this yet 😅 what did I do wrong?

@DavidKorczynski
Copy link
Contributor Author

I don't think that I've quite mastered this yet 😅 what did I do wrong?

I think it's the language attribute missing: https://google.github.io/oss-fuzz/getting-started/continuous-integration/#optional-configuration

If you set it to python I think it should work!

@chrisjsewell
Copy link
Member

thanks, that seems to have worked: https://github.com/executablebooks/markdown-it-py/actions/runs/4543950408/jobs/8009331193?pr=255

could you explain the result for me:

...
2023-03-28 14:15:20,013 - root - INFO - Trying to reproduce crash using: /tmp/tmp6zgwo3_i/crash-d9e621f7e53a457e0c6b9133b72f77d5e81943e1.
2023-03-28 14:16:50,748 - root - INFO - Reproduce command returned: 1. Reproducible on /github/workspace/cifuzz-prev-build/fuzz_markdown.
2023-03-28 14:16:50,799 - root - INFO - Not uploading crashes because on OSS-Fuzz.

it seems there was a crash, but it did not upload it, so I can't reporoduce it?
are crashes only failing/uploaded if they are caused specifically by the changed code in the PR?

@chrisjsewell
Copy link
Member

are crashes only failing/uploaded if they are caused specifically by the changed code in the PR?

oh yeh I should properly read the documentation lol: https://google.github.io/oss-fuzz/getting-started/continuous-integration/#how-it-works

@DavidKorczynski
Copy link
Contributor Author

it seems there was a crash, but it did not upload it, so I can't reporoduce it?
are crashes only failing/uploaded if they are caused specifically by the changed code in the PR?

Yes -- this given crash was visible before, it seems to be this one: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55371

@chrisjsewell
Copy link
Member

ok great, so from your readme/explanation, it seems:

  1. the workflow could be helpful, for "early detection" of failures in PRs, although perhaps I don't want to wait 10 minutes for every PR commit 🤔
  2. The command: python3 infra/helper.py reproduce markdown-it-py {FUZZER_NAME} {PATH_TO_MINIMIZED_TESTCASE}, seems the most useful, for quickly reproducing bugs reported by you guys. Perhaps I can set up a "tox env" to git clone + run this

I'm not sure of the usefulness of tests/fuzz/fuzz_markdown.py and tests/fuzz/fuzz_markdown_extended.py though; I'm not sure how/if I would actually ever use them

@DavidKorczynski
Copy link
Contributor Author

  1. Many projects prefer to have a shorter runtime -- I personally have mine on 120sec, but definitely tailor it to your needs and what fits your workflow. OSS-Fuzz will run the fuzzers continuously which will catch bugs that are more complex to reach.

I'm not sure of the usefulness of tests/fuzz/fuzz_markdown.py and tests/fuzz/fuzz_markdown_extended.py though; I'm not sure how/if I would actually ever use them

You're not meant to use them as such. Having a folder with fuzzers in this repo that OSS-Fuzz then fetches to run continuously serves the purpose of you being able to add/modify the fuzzers without having to make PRs on the OSS-Fuzz repo.

@chrisjsewell
Copy link
Member

Hey @DavidKorczynski thanks for your patience 😅

In 2d46a43 and d1852a5 I separately added the CI workflow for PRs and a tox environment to reproduce crash files,

Then here, in 497fee8, I rewrote the README as I now understand the process.
If you could have a look and check that this makes sense, then I think this is good to merge!

@DavidKorczynski
Copy link
Contributor Author

Looks great @chrisjsewell!

@chrisjsewell chrisjsewell merged commit baa8658 into executablebooks:master May 31, 2023
15 checks passed
@welcome
Copy link

welcome bot commented May 31, 2023

Congrats on your first merged pull request in this project! 🎉
congrats

Thank you for contributing, we are very proud of you! ❤️

DavidKorczynski added a commit to google/oss-fuzz that referenced this pull request May 31, 2023
Ref: executablebooks/markdown-it-py#255

Signed-off-by: David Korczynski <david@adalogics.com>
AdamKorcz pushed a commit to google/oss-fuzz that referenced this pull request May 31, 2023
Ref: executablebooks/markdown-it-py#255

Signed-off-by: David Korczynski <david@adalogics.com>
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 6, 2023
## 3.0.0 - 2023-06-03

⚠️ This release contains some minor breaking changes in the internal API and improvements to the parsing strictness.

**Full Changelog**: <executablebooks/markdown-it-py@v2.2.0...v3.0.0>

### ⬆️ UPGRADE: Drop support for Python 3.7

Also add testing for Python 3.11

### ⬆️ UPGRADE: Update from upstream markdown-it `12.2.0` to `13.0.0`

A key change is the addition of a new `Token` type, `text_special`, which is used to represent HTML entities and backslash escaped characters.
This ensures that (core) typographic transformation rules are not incorrectly applied to these texts.
The final core rule is now the new `text_join` rule, which joins adjacent `text`/`text_special` tokens,
and so no `text_special` tokens should be present in the final token stream.
Any custom typographic rules should be inserted before `text_join`.

A new `linkify` rule has also been added to the inline chain, which will linkify full URLs (e.g. `https://example.com`),
and fixes collision of emphasis and linkifier (so `http://example.org/foo._bar_-_baz` is now a single link, not emphasized).
Emails and fuzzy links are not affected by this.

* ♻️ Refactor backslash escape logic, add `text_special` [#276](executablebooks/markdown-it-py#276)
* ♻️ Parse entities to `text_special` token [#280](executablebooks/markdown-it-py#280)
* ♻️ Refactor: Add linkifier rule to inline chain for full links [#279](executablebooks/markdown-it-py#279)
* ‼️ Remove `(p)` => `§` replacement in typographer [#281](executablebooks/markdown-it-py#281)
* ‼️ Remove unused `silent` arg in `ParserBlock.tokenize` [#284](executablebooks/markdown-it-py#284)
* 🐛 FIX: numeric character reference passing [#272](executablebooks/markdown-it-py#272)
* 🐛 Fix: tab preventing paragraph continuation in lists [#274](executablebooks/markdown-it-py#274)
* 👌 Improve nested emphasis parsing [#273](executablebooks/markdown-it-py#273)
* 👌 fix possible ReDOS in newline rule [#275](executablebooks/markdown-it-py#275)
* 👌 Improve performance of `skipSpaces`/`skipChars` [#271](executablebooks/markdown-it-py#271)
* 👌 Show text of `text_special` in `tree.pretty` [#282](executablebooks/markdown-it-py#282)

### ♻️ REFACTOR: Replace most character code use with strings

The use of `StateBase.srcCharCode` is deprecated (with backward-compatibility), and all core uses are replaced by `StateBase.src`.

Conversion of source string characters to an integer representing the Unicode character is prevalent in the upstream JavaScript implementation, to improve performance.
However, it is unnecessary in Python and leads to harder to read code and performance deprecations (during the conversion in the `StateBase` initialisation).

See [#270](executablebooks/markdown-it-py#270), thanks to [@hukkinj1](https://github.com/hukkinj1).

### ♻️ Centralise indented code block tests

For CommonMark, the presence of indented code blocks prevent any other block element from having an indent of greater than 4 spaces.
Certain Markdown flavors and derivatives, such as mdx and djot, disable these code blocks though, since it is more common to use code fences and/or arbitrary indenting is desirable.
Previously, disabling code blocks did not remove the indent limitation, since most block elements had the 3 space limitation hard-coded.
This change centralised the logic of applying this limitation (in `StateBlock.is_code_block`), and only applies it when indented code blocks are enabled.

This allows for e.g.

```md
<div>
  <div>

    I can indent as much as I want here.

  <div>
<div>
```

See [#260](executablebooks/markdown-it-py#260)

### 🔧 Maintenance changes

Strict type annotation checking has been applied to the whole code base,
[ruff](https://github.com/charliermarsh/ruff) is now used for linting,
and fuzzing tests have been added to the CI, to integrate with Google [OSS-Fuzz](https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py) testing, thanks to [@DavidKorczynski](https://github.com/DavidKorczynski).

* 🔧 MAINTAIN: Make type checking strict [#](executablebooks/markdown-it-py#267)
* 🔧 Add typing of rule functions [#283](executablebooks/markdown-it-py#283)
* 🔧 Move linting from flake8 to ruff [#268](executablebooks/markdown-it-py#268)
* 🧪 CI: Add fuzzing workflow for PRs [#262](executablebooks/markdown-it-py#262)
* 🔧 Add tox env for fuzz testcase run [#263](executablebooks/markdown-it-py#263)
* 🧪 Add OSS-Fuzz set up by @DavidKorczynski in [#255](executablebooks/markdown-it-py#255)
* 🧪 Fix fuzzing test failures [#254](executablebooks/markdown-it-py#254)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants