Demojize performance #197

cvzi · 2021-11-30T17:59:04Z

This pull request improves the demojize performance. I have replace the big regular expression with a search tree. The method _get_search_tree() creates the tree once on first use. I have put an example of a tree in the comment of _get_search_tree()

The get_emoji_regexp() still exists but it is unused now.

Performance is faster on my machine for both single emoji and a long text. This is approximately how much faster it is on my machine:

Python	2.7	3.6	3.10
Single emoji:	7x	7x	12x	faster
Text [200k chars]:	1.5x	7x	23x	faster

I have also added emojize(language='alias') as an alternative to emojize(use_aliases=True).
When someone uses emojize(language='de', use_aliases=True) or any other language then 'en', a warning is shown:

warnings.warn("use_aliases=True is only supported for language='en'. "
              "It is recommended to use emojize(string, language='alias') instead")

(I could remove this change to the aliases part, it is not connected to the performance at all)

…r of previous versions) Bugfix: Default delimiters were used instead of the custom delimiters in emojize() when an unknown emoji was found.

TahirJalilov · 2021-12-06T11:02:06Z

Thank you for your PR @cvzi

cvzi and others added 5 commits November 18, 2021 16:11

Improve performance of demojize

cfe81a4

More tests

bd9d89d

Code style

148d531

use_aliases=True overrides language='...' (this restores the behaviou…

424acc5

…r of previous versions) Bugfix: Default delimiters were used instead of the custom delimiters in emojize() when an unknown emoji was found.

small fixes

b3b0dc6

TahirJalilov merged commit e35fc45 into carpedm20:master Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demojize performance #197

Demojize performance #197

cvzi commented Nov 30, 2021

TahirJalilov commented Dec 6, 2021

Demojize performance #197

Demojize performance #197

Conversation

cvzi commented Nov 30, 2021

TahirJalilov commented Dec 6, 2021