Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments and other decorations in regexes can bloat size of syntaxes #481

Open
CosmicHorrorDev opened this issue Jul 1, 2023 · 1 comment

Comments

@CosmicHorrorDev
Copy link
Contributor

CosmicHorrorDev commented Jul 1, 2023

This isn't something I found very prevalent, so I don't think it's too pressing of an issue, but .sublime-syntax definitions can include significant whitespace and comments in regex strings using the x flag. The only default definition I noticed that heavily uses this is Markdown.sublime-syntax. Manually stripping out the comments dropped the size of serialized_lazy_contexts from 11.0 KiB to 7.9 KiB

I'm not too sure of existing library support, but it may be possible to normalize the regexes when loading them to strip away these decorations. It's worth noting that Markdown.sublime-syntax is by no means the worst size offender (that would be PHP Source's serialized_lazy_contexts at ~30 KiB. Seemingly from its very long lists of keywords that it matches in regexes)

@CosmicHorrorDev
Copy link
Contributor Author

CosmicHorrorDev commented May 2, 2024

I've had some more time to dig into this, and I do think that we should be able to automatically strip away comments while doing some of the pre-processing that happens while generating a syntax set. The general gist would be parsing the regex with regex_syntax's Parser::parse_with_comments() and then from there you can manually strip out the comments in the original string based off of the span that's stored with them

I'd be happy to give these changes a shot if they seem worth it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant