Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up JSON and reduce HTML formatter consumption #1569

Merged
merged 8 commits into from Oct 26, 2020

Commits on Oct 11, 2020

  1. Update the JSON-LD keyword list to match JSON-LD 1.1

    Changes in this patch:
    
    * Update the JSON-LD URL to HTTPS
    * Update the list of JSON-LD keywords
    * Make the JSON-LD parser less dependent on the JSON lexer implementation
    * Add unit tests for the JSON-LD lexer
    kurtmckee committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    61de3c3 View commit details
    Browse the repository at this point in the history
  2. Add unit tests for the JSON parser

    This includes:
    
    * Testing valid literals
    * Testing valid string escapes
    * Testing that object keys are tokenized differently from string values
    kurtmckee committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    271be39 View commit details
    Browse the repository at this point in the history
  3. Rewrite the JSON lexer

    Related to pygments#1425
    
    Included in this change:
    
    * The JSON parser is rewritten
    * The JSON bare object parser no longer requires additional code
    * `get_tokens_unprocessed()` returns as much as it can to reduce yields
      (for example, side-by-side punctuation is not returned separately)
    * The unit tests were updated
    * Add unit tests based on Hypothesis test results
    kurtmckee committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    03eeda6 View commit details
    Browse the repository at this point in the history
  4. Reduce HTML formatter memory consumption by ~33% and speed it up

    Related to pygments#1425
    
    Tested on a 118MB JSON file. Memory consumption tops out at ~3GB before
    this patch and drops to only ~2GB with this patch. These were the command
    lines used:
    
    python -m pygments -l json -f html -o .\new-code-classes.html .\jc-output.txt
    python -m pygments -l json -f html -O "noclasses" -o .\new-code-styles.html .\jc-output.txt
    kurtmckee committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    3abacaf View commit details
    Browse the repository at this point in the history
  5. Add an LRU cache to the HTML formatter's HTML-escaping and line-split…

    …ting
    
    For a 118MB JSON input file, this reduces memory consumption by ~500MB
    and reduces formatting time by ~15 seconds.
    kurtmckee committed Oct 11, 2020
    Configuration menu
    Copy the full SHA
    69e4882 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2020

  1. Configuration menu
    Copy the full SHA
    a219f27 View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2020

  1. Configuration menu
    Copy the full SHA
    8800a92 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e086e48 View commit details
    Browse the repository at this point in the history