Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3.0.0 #223

Merged
merged 9 commits into from Oct 20, 2022
Merged

Version 3.0.0 #223

merged 9 commits into from Oct 20, 2022

Conversation

Ousret
Copy link
Owner

@Ousret Ousret commented Oct 18, 2022

3.0.0 (2022-10-20)

Added

  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
  • normalizer --version now specify if the current version provides extra speedup (meaning mypyc compilation whl)

Changed

  • Build with static metadata (not pyproject.toml yet)
  • Make language detection stricter
  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

Fixed

  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha characters have been fed to it
  • Sphinx warnings when generating the documentation

Removed

  • Coherence detector no longer returns 'Simple English' instead returns 'English'
  • Coherence detector no longer returns 'Classical Chinese' instead returns 'Chinese'
  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflicts with ASCII)
  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2022

Codecov Report

Merging #223 (ef42849) into master (db134f3) will not change coverage.
The diff coverage is 100.00%.

❗ Current head ef42849 differs from pull request most recent head 28a8c6f. Consider uploading reports for the commit 28a8c6f to get more accurate results

@@           Coverage Diff           @@
##           master     #223   +/-   ##
=======================================
  Coverage   89.89%   89.89%           
=======================================
  Files          10       10           
  Lines        1187     1187           
=======================================
  Hits         1067     1067           
  Misses        120      120           
Impacted Files Coverage Δ
charset_normalizer/version.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Ousret Ousret merged commit 0ec52ef into master Oct 20, 2022
@Ousret Ousret deleted the release-3.0 branch October 20, 2022 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants