Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-use decoded buffer for short texts #175

Merged
merged 9 commits into from Jun 18, 2022
Merged

Re-use decoded buffer for short texts #175

merged 9 commits into from Jun 18, 2022

Conversation

nijel
Copy link
Contributor

@nijel nijel commented Mar 24, 2022

This avoids issues with detecting string boundaries while improving
performance (avoids multiple decoding of the sequence).

Fixes #174

This avoids issues with detecting string boundaries while improving
performance (avoids multiple decoding of the sequence).

Fixes Ousret#174
Copy link
Owner

@Ousret Ousret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal, some initial quick thoughts.

charset_normalizer/api.py Outdated Show resolved Hide resolved
charset_normalizer/api.py Outdated Show resolved Hide resolved
charset_normalizer/api.py Show resolved Hide resolved
data/sample-polish.txt Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 18, 2022

Codecov Report

Merging #175 (bca1033) into master (7cbd7fc) will increase coverage by 0.07%.
The diff coverage is 85.00%.

@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
+ Coverage   89.79%   89.86%   +0.07%     
==========================================
  Files          11       11              
  Lines        1205     1214       +9     
==========================================
+ Hits         1082     1091       +9     
  Misses        123      123              
Impacted Files Coverage Δ
charset_normalizer/api.py 86.82% <66.66%> (-0.11%) ⬇️
charset_normalizer/utils.py 86.17% <92.59%> (+0.83%) ⬆️
charset_normalizer/version.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7cbd7fc...bca1033. Read the comment docs.

@Ousret
Copy link
Owner

Ousret commented Jun 18, 2022

This PR does improve the overall quality and performance of the project and fixed an unexpected issue (in cpython).
👌

Copy link
Owner

@Ousret Ousret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Ousret Ousret merged commit 4846792 into Ousret:master Jun 18, 2022
@Ousret Ousret mentioned this pull request Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] utf-8 misdetected as cp1256
3 participants