Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❇️ Various detection improvement (MD+CD) #117

Merged
merged 7 commits into from Sep 26, 2021
Merged

Conversation

Ousret
Copy link
Owner

@Ousret Ousret commented Sep 25, 2021

This PR aim to improve the detection coverage by 1%.

  • Improve the Turkish language coherence detection
  • Test in a improved order languages based on the source alphabet.

@Ousret Ousret added enhancement New feature or request detection Related to the charset detection mechanism, chaos/mess/coherence labels Sep 25, 2021
@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2021

Codecov Report

Merging #117 (9c63ebc) into master (42a7d3d) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
+ Coverage   86.24%   86.31%   +0.07%     
==========================================
  Files          11       11              
  Lines        1185     1206      +21     
==========================================
+ Hits         1022     1041      +19     
- Misses        163      165       +2     
Impacted Files Coverage Δ
charset_normalizer/cd.py 96.40% <100.00%> (+0.48%) ⬆️
charset_normalizer/md.py 87.82% <100.00%> (ø)
charset_normalizer/models.py 85.71% <100.00%> (-0.96%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42a7d3d...9c63ebc. Read the comment docs.

Because I dared to use dict vanilla.. instead of OrderedDict
@Ousret
Copy link
Owner Author

Ousret commented Sep 25, 2021

Backward-Compatibility with Chardet
Total EST BC = 88.0 % (413 / 467 files)

Coverage (With Preemptive)
Total EST coverage = 98.0 % (457 / 467 files)

@Ousret
Copy link
Owner Author

Ousret commented Sep 26, 2021

This PR showed a minor inconsistency between py35 and others versions.
Its maybe what #102 pointed out.

@Ousret Ousret merged commit cf1f76a into master Sep 26, 2021
@Ousret Ousret deleted the patch-detect-improvement branch September 26, 2021 15:12
@Ousret Ousret mentioned this pull request Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detection Related to the charset detection mechanism, chaos/mess/coherence enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants