Releases · jawah/charset_normalizer

15 Aug 16:17

Ousret

3.0.0b1

09402e6

Version 3.0.0b1 Pre-release

Pre-release

3.0.0b1 (2022-08-15)

Changed

Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

Removed

Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
Breaking: Top-level function normalize
Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
Support for the backport unicodedata2

Assets 74

19 Jun 21:56

Ousret

2.1.0

cb2dbde

Version 2.1.0

2.1.0 (2022-06-19)

Added

Output the Unicode table version when running the CLI with --version (PR #194)

Changed

Re-use decoded buffer for single byte character sets from @nijel (PR #175)
Fixing some performance bottlenecks from @deedy5 (PR #183)

Fixed

Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
CLI default threshold aligned with the API threshold from @oleksandr-kuzmenko (PR #181)

Removed

Support for Python 3.5 (PR #192)

Deprecated

Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)

Assets 2

12 Feb 14:25

Ousret

2.0.12

a5f4348

Version 2.0.12

2.0.12 (2022-02-12)

Fixed

ASCII miss-detection on rare cases (PR #170)

Assets 2

30 Jan 18:26

Ousret

2.0.11

f256c3e

Version 2.0.11

2.0.11 (2022-01-30)

Added

Explicit support for Python 3.11 (PR #164)

Changed

The logging behavior has been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)

Assets 2

04 Jan 20:15

Ousret

2.0.10

de25562

Version 2.0.10

2.0.10 (2022-01-04)

Fixed

Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)

Changed

Skipping the language-detection (CD) on ASCII (PR #155)

Assets 2

03 Dec 19:27

Ousret

2.0.9

3874edb

Version 2.0.9

2.0.9 (2021-12-03)

Changed

Moderating the logging impact (since 2.0.8) for specific environments (PR #147)

Fixed

Wrong logging level applied when setting kwarg explain to True (PR #146)

Assets 2

24 Nov 19:45

Ousret

2.0.8

8913e21

Version 2.0.8

Changed

Improvement over Vietnamese detection (PR #126)
MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
Efficiency improvements in cd/alphabet_languages from @adbar (PR #122)
call sum() without an intermediary list following PEP 289 recommendations from @adbar (PR #129)
Code style as refactored by Sourcery-AI (PR #131)
Minor adjustment on the MD around european words (PR #133)
Remove and replace SRTs from assets / tests (PR #139)
Initialize the library logger with a NullHandler by default from @nmaynes (PR #135)
Setting kwarg explain to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)

Fixed

Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
Avoid using too insignificant chunk (PR #137)

Added

Add and expose function set_logging_handler to configure a specific StreamHandler from @nmaynes (PR #135)
Add CHANGELOG.md entries, format is based on Keep a Changelog (PR #141)

Assets 2

11 Oct 21:27

Ousret

2.0.7

ea44bd7

Version 2.0.7

We arrived in a pretty stable state.

Changes:

Addition: 🍱 Add support for Kazakh (Cyrillic) language detection #109
Improvement: ❇️ Further improve inferring the language from a given code page (single-byte) #112
Removed: 🔥 Remove redundant logging entry about detected language(s) #115
Miscellaneous: 🔧 Trying to leverage PEP263 when PEP3120 is not supported #116
- While I do not think that this (116) will actually fix something, it will rather raise a SyntaxError (Not about ASCII decoding error) for those trying to install this package using a non-supported Python version
Improvement: ⚡ Refactoring for potential performance improvements in loops #113 @adbar
Improvement: ✨ Various detection improvement (MD+CD) #117
Bugfix: 🐛 Fix a minor inconsistency between Python 3.5 and other versions regarding language detection #117 #102

This version pushes forward the detection-coverage to 98%! https://github.com/Ousret/charset_normalizer/runs/3863881150
The great filter (cannot be better than) shall be 99% in conjunction with the current dataset. In future releases.

Contributors

adbar

Assets 2

17 Sep 22:21

Ousret

2.0.6

bf70e9c

Version 2.0.6

Changes:

Bugfix: 🐛 Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x #100
Bugfix: 🐛 Fix CLI crash when using --minimal output in certain cases #103
Improvement: ✨ Minor improvement to the detection efficiency (less than 1%) #106 #101

Assets 2

14 Sep 19:39

Ousret

2.0.5

2404237

Version 2.0.5

Changes:

Internal: 🎨 The project now comply with: flake8, mypy, isort and black to ensure a better overall quality #81
Internal: 🎨 The MANIFEST.in was not exhaustive #78
Improvement: ✨ The BC-support with v1.x was improved, the old staticmethods are restored #82
Remove: 🔥 The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead #92
Improvement: ✨ The Unicode detection is slightly improved, see #93
Bugfix: 🐛 In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection #95
Bugfix: 🐛 Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection #96
Improvement: 🎨 Add syntax sugar __bool__ for results CharsetMatches list-container see #91

This release push further the detection coverage to 97 % !

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.0.0b1 (2022-08-15)

Changed

Removed

2.1.0 (2022-06-19)

Added

Changed

Fixed

Removed

Deprecated

2.0.12 (2022-02-12)

Fixed

2.0.11 (2022-01-30)

Added

Changed

2.0.10 (2022-01-04)

Fixed

Changed

2.0.9 (2021-12-03)

Changed

Fixed

Changed

Fixed

Added

Contributors

Releases: jawah/charset_normalizer

Version 3.0.0b1

3.0.0b1 (2022-08-15)

Changed

Removed

Version 2.1.0

2.1.0 (2022-06-19)

Added

Changed

Fixed

Removed

Deprecated

Version 2.0.12

2.0.12 (2022-02-12)

Fixed

Version 2.0.11

2.0.11 (2022-01-30)

Added

Changed

Version 2.0.10

2.0.10 (2022-01-04)

Fixed

Changed

Version 2.0.9

2.0.9 (2021-12-03)

Changed

Fixed

Version 2.0.8

Changed

Fixed

Added

Version 2.0.7

Contributors

Version 2.0.6

Version 2.0.5