Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Rich Should Accept Highlights as re.compiled re.Patterns and Use them Internally #3345

Open
PyWoody opened this issue Apr 26, 2024 · 3 comments

Comments

@PyWoody
Copy link

PyWoody commented Apr 26, 2024

Rich should take advantage of the potential speed increases through compiled regular expressions in the re.compile function in the stdlib re module.

I have created a fork here: https://github.com/PyWoody/rich/tree/re_compiled that has the changes in place for demoing.

Using the EmailHighlighter example from the docs, a new Highlighter instance could be created like so

import re

from rich.console import Console
from rich.highlighter import RegexHighlighter
from rich.theme import Theme

class EmailHighlighter(RegexHighlighter):
    """Apply style to anything that looks like an email."""

    base_style = "example."
    highlights = [re.compile(r"(?P<email>[\w-]+@([\w-]+\.)+[\w-]+)")]


theme = Theme({"example.email": "bold magenta"})
console = Console(highlighter=EmailHighlighter(), theme=theme)
console.print("Send funds to money@example.org")

Note, the above example will already work in the default version because re.finditer automatically compiles a re.Pattern or string to a re.Pattern, as shown here: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L219, but it does not save it for re-use. The _compile function in re will do some caching automatically, as shown here: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280, but it will be called every single time rich.text.Text.highlight_regex is called versus just saving the compiled version yourself.

The more regular expressions a Highlighter uses the more the re.Patterns will be cached, further allowing speed increases. For instance, the rich.highlighter.ISO8601Highlighter found updated here: https://github.com/PyWoody/rich/blob/re_compiled/rich/highlighter.py#L144, has a considerable speed increase compared to the default version.

The major caveat will be for custom Highlighters that use strings exclusively. There will be a marginal speed decrease in these situations as each call will need to be isinstanced checked and re.compiled on demand. This is evident in the highlight_regex method in rich.text.Text class found updated here: https://github.com/PyWoody/rich/blob/re_compiled/rich/text.py#L615. In my testing, the decrease was marginal enough to be difficult to extract a difference from the noise.

The net-net is basically using re.compile for default Highlighters is a free win, people that want to use re.compile in their custom highlighters get the speed boost, and existing Highlighters out in-the-wild or people that want to use strings exclusively only receive marginal speed decrease.

Copy link

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

@willmcgugan
Copy link
Collaborator

You're only changing when the regexes are compiled. Either you do it the first time you use it, or you do it at import time. Once compiled, there is going to be negligible differences between the two approaches.

I wouldn't want the builtin highlighters to use the pre-compiling approach, because startup-time for CLIs is a concern. But if you want to PR the change to highlight_regex to allow custom highlighters to pre-compile, I would accept that...

@PyWoody
Copy link
Author

PyWoody commented Apr 27, 2024

Hi Will,

Thanks for taking the time to review the issue and make a comment. The whole time I was doing the writeup I kept trying to figure out what I was missing and the startup for CLIs is definitely it. That makes complete sense.

I'll make the PR for highlight_regex when I have a chance. I'll add some basic comparison tests to see if it's worth it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants