Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring for potential performance improvements in loops #113

Merged
merged 19 commits into from Sep 24, 2021
Merged

Refactoring for potential performance improvements in loops #113

merged 19 commits into from Sep 24, 2021

Conversation

adbar
Copy link
Contributor

@adbar adbar commented Sep 21, 2021

Experiments with ideas to potentially improve performance or code consistency without impacting readability (#111).

This PR:

  1. defines caches and sets in cd.py
  2. uses list comprehensions for language associations in cd.py
  3. refactors duplicate code in md.py

Close #111

@codecov-commenter
Copy link

codecov-commenter commented Sep 21, 2021

Codecov Report

Merging #113 (59d22e0) into master (59e48eb) will increase coverage by 0.04%.
The diff coverage is 91.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
+ Coverage   86.19%   86.24%   +0.04%     
==========================================
  Files          11       11              
  Lines        1188     1185       -3     
==========================================
- Hits         1024     1022       -2     
+ Misses        164      163       -1     
Impacted Files Coverage Δ
charset_normalizer/models.py 86.66% <83.33%> (+0.16%) ⬆️
charset_normalizer/md.py 87.82% <85.71%> (-0.04%) ⬇️
charset_normalizer/cd.py 95.91% <100.00%> (-0.03%) ⬇️
charset_normalizer/constant.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59e48eb...59d22e0. Read the comment docs.

@Ousret Ousret self-requested a review September 21, 2021 15:33
Copy link
Owner

@Ousret Ousret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In quick tests, I did see a minor performance improvement. About 2 ms in average, that not bad.
There some lint errors to fix.

charset_normalizer/cd.py Outdated Show resolved Hide resolved
charset_normalizer/cd.py Outdated Show resolved Hide resolved
charset_normalizer/cd.py Outdated Show resolved Hide resolved
charset_normalizer/md.py Outdated Show resolved Hide resolved
charset_normalizer/md.py Show resolved Hide resolved
charset_normalizer/cd.py Outdated Show resolved Hide resolved
@Ousret
Copy link
Owner

Ousret commented Sep 21, 2021

Plus the tests show a warning.

=============================== warnings summary ===============================
tests/test_cli.py: 2767815 warnings
  /home/runner/work/charset_normalizer/charset_normalizer/charset_normalizer/models.py:231: DeprecationWarning: NotImplemented should not be used in a boolean context
    self._unicode_ranges = sorted(filter(None.__ne__, detected_ranges))

-- Docs: https://docs.pytest.org/en/stable/warnings.html

@adbar
Copy link
Contributor Author

adbar commented Sep 22, 2021

Hi @Ousret, I did the changes as requested, I'm not sure what the warning has to do with DeprecationWarning: NotImplemented.

@akx
Copy link
Contributor

akx commented Sep 22, 2021

Did you measure that this makes things faster? For one, a common micro-optimization in Python is to alias global variables into locals within functions, since local variable access is faster than global variable access.

charset_normalizer/md.py Outdated Show resolved Hide resolved
charset_normalizer/models.py Outdated Show resolved Hide resolved
charset_normalizer/md.py Outdated Show resolved Hide resolved
charset_normalizer/models.py Outdated Show resolved Hide resolved
charset_normalizer/md.py Outdated Show resolved Hide resolved
charset_normalizer/cd.py Outdated Show resolved Hide resolved
charset_normalizer/cd.py Show resolved Hide resolved
charset_normalizer/cd.py Outdated Show resolved Hide resolved
@Ousret
Copy link
Owner

Ousret commented Sep 23, 2021

Thanks for the adjustments @adbar , remain a few remarks, but I think this is about it.

Co-authored-by: TAHRI Ahmed R. <Ousret@users.noreply.github.com>
@adbar
Copy link
Contributor Author

adbar commented Sep 23, 2021

@Ousret The linter throws an error since 8f46e15, I guess that List[str] was correct.

@Ousret
Copy link
Owner

Ousret commented Sep 23, 2021

More a limitation of mypy.

charset_normalizer/models.py:229: error: List comprehension has incompatible type List[Optional[str]]; expected List[str]

Don't get that you filtered out the None. type: ignore is more suited then. Or I missed something..?

@adbar
Copy link
Contributor Author

adbar commented Sep 23, 2021

I'm not so proficient with mypy but type: ignore could do the trick yes.

@adbar
Copy link
Contributor Author

adbar commented Sep 23, 2021

@Ousret @akx Thanks for the feedback, I think that's all for now.

@Ousret Could you please decide on the mypy issue?

charset_normalizer/md.py Outdated Show resolved Hide resolved
Copy link
Owner

@Ousret Ousret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Ousret Ousret merged commit 42a7d3d into Ousret:master Sep 24, 2021
@Ousret Ousret mentioned this pull request Sep 27, 2021
Ousret added a commit that referenced this pull request Oct 11, 2021
* 📝 Update claims

* 🔖 Bump version to 2.0.7

* 🐛 Fix regression from PR #113 List instead of Set for alphabets property

* fix type output in alphabets property

* ✔️ Add test case to ensure non-regression upon 28c3ae1

* ✔️ Add tests and ignore old legacy methods cover

* ❇️ Add autofix script for black and isort linters

* 📝 Update contrib.md

* 🔧 Python 3.10 (using public release) tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Performance improvements in loops
4 participants