Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3.0 #209

Merged
merged 68 commits into from Oct 18, 2022
Merged

Version 3.0 #209

merged 68 commits into from Oct 18, 2022

Conversation

Ousret
Copy link
Owner

@Ousret Ousret commented Aug 15, 2022

3.0.0 (2022-10-18)

Added

  • normalizer --version now specify if the current version provides extra speedup (meaning mypyc compilation whl)
  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio

Changed

  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
  • Build with static metadata using 'build' frontend
  • Make language detection stricter

Removed

  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2
  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflicts with ASCII)
  • Coherence detector no longer returns 'Simple English' instead returns 'English'
  • Coherence detector no longer returns 'Classical Chinese' instead returns 'Chinese'

Fixed

  • Sphinx warnings when generating the documentation
  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha characters have been fed to it

@Ousret Ousret added enhancement New feature or request do-not-merge not yet ready for prime time labels Aug 15, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 15, 2022

Codecov Report

Merging #209 (6367d53) into master (6602dae) will decrease coverage by 0.01%.
The diff coverage is 90.74%.

@@            Coverage Diff             @@
##           master     #209      +/-   ##
==========================================
- Coverage   89.90%   89.89%   -0.02%     
==========================================
  Files          11       10       -1     
  Lines        1218     1187      -31     
==========================================
- Hits         1095     1067      -28     
+ Misses        123      120       -3     
Impacted Files Coverage Δ
charset_normalizer/assets/__init__.py 100.00% <ø> (ø)
charset_normalizer/constant.py 100.00% <ø> (ø)
charset_normalizer/md.py 95.00% <37.50%> (-1.72%) ⬇️
charset_normalizer/api.py 86.66% <100.00%> (-0.23%) ⬇️
charset_normalizer/cd.py 96.29% <100.00%> (-0.05%) ⬇️
charset_normalizer/cli/normalizer.py 75.00% <100.00%> (ø)
charset_normalizer/legacy.py 92.85% <100.00%> (-3.92%) ⬇️
charset_normalizer/models.py 89.65% <100.00%> (+2.98%) ⬆️
charset_normalizer/utils.py 85.98% <100.00%> (-0.20%) ⬇️
charset_normalizer/version.py 100.00% <100.00%> (ø)
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Ousret

This comment was marked as resolved.

Ousret and others added 28 commits August 21, 2022 20:53
… most two entries, will log in details the Mess-detector results
+ 🔥 Coherence detector no longer return 'Simple English' instead return 'English'
…d `from_fp` to adjust the minimum expected coherence ratio
Improve the condition on issue #200
…n when too few alpha character have been fed to it
(i) ensure build are reproductible (ii) still support python 3.6
@Ousret Ousret merged commit 544595d into master Oct 18, 2022
@Ousret Ousret deleted the 3.0 branch October 20, 2022 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants