Skip to content

Commit

Permalink
Merge pull request #209 from Ousret/3.0
Browse files Browse the repository at this point in the history
Version 3.0.0rc1
  • Loading branch information
Ousret committed Oct 18, 2022
2 parents 6602dae + 6367d53 commit 544595d
Show file tree
Hide file tree
Showing 38 changed files with 933 additions and 509 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/chardet-bc.yml
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
56 changes: 0 additions & 56 deletions .github/workflows/codeql-analysis.yml

This file was deleted.

3 changes: 2 additions & 1 deletion .github/workflows/detector-coverage.yml
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/integration.yml
Expand Up @@ -28,7 +28,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/lint.yml
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Type checking (Mypy)
run: |
mypy --strict charset_normalizer
Expand Down
40 changes: 40 additions & 0 deletions .github/workflows/mypyc-verify.yml
@@ -0,0 +1,40 @@
name: MYPYC Run

on: [push, pull_request]

jobs:
detection_coverage:
runs-on: ${{ matrix.os }}

strategy:
fail-fast: false
matrix:
python-version: [3.6, 3.7, 3.8, 3.9, "3.10"]
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -U pip setuptools
pip install -r dev-requirements.txt
pip uninstall -y charset-normalizer
- name: Install the package
env:
CHARSET_NORMALIZER_USE_MYPYC: '1'
run: |
python -m build --no-isolation
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
- name: Coverage WITH preemptive
run: |
python ./bin/coverage.py --coverage 97 --with-preemptive
- name: Coverage WITHOUT preemptive
run: |
python ./bin/coverage.py --coverage 95
3 changes: 2 additions & 1 deletion .github/workflows/performance.yml
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
108 changes: 100 additions & 8 deletions .github/workflows/python-publish.yml
Expand Up @@ -29,7 +29,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Type checking (Mypy)
run: |
mypy charset_normalizer
Expand All @@ -51,7 +52,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: [ 3.6, 3.7, 3.8, 3.9, "3.10" ]
python-version: [ 3.6, 3.7, 3.8, 3.9, "3.10", "3.11-dev" ]
os: [ ubuntu-latest ]

steps:
Expand All @@ -67,7 +68,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Run tests
run: |
pytest
Expand Down Expand Up @@ -96,7 +98,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down Expand Up @@ -136,7 +139,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand All @@ -146,11 +150,92 @@ jobs:
- name: Integration Tests with Requests
run: |
python ./bin/integration.py
universal-wheel:
runs-on: ubuntu-latest
needs:
- integration
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Update pip, setuptools, wheel and twine
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build Wheel
env:
CHARSET_NORMALIZER_USE_MYPYC: '0'
run: python -m build
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: dist
path: dist

build-wheels:
name: Build wheels on ${{ matrix.os }} ${{ matrix.qemu }}
runs-on: ${{ matrix.os }}-latest
needs: universal-wheel
strategy:
matrix:
os: [ ubuntu, windows, macos ]
qemu: [ '' ]
include:
# Split ubuntu job for the sake of speed-up
- os: ubuntu
qemu: aarch64
- os: ubuntu
qemu: ppc64le
- os: ubuntu
qemu: s390x
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
- name: Set up QEMU
if: ${{ matrix.qemu }}
uses: docker/setup-qemu-action@v2
with:
platforms: all
id: qemu
- name: Prepare emulation
run: |
if [[ -n "${{ matrix.qemu }}" ]]; then
# Build emulated architectures only if QEMU is set,
# use default "auto" otherwise
echo "CIBW_ARCHS_LINUX=${{ matrix.qemu }}" >> $GITHUB_ENV
fi
shell: bash
- name: Setup Python
uses: actions/setup-python@v4
- name: Update pip, wheel, setuptools, build, twine
run: |
python -m pip install -U pip wheel setuptools build twine
- name: Build wheels
uses: pypa/cibuildwheel@2.10.2
env:
CIBW_BUILD_FRONTEND: "build"
CIBW_ARCHS_MACOS: x86_64 arm64 universal2
CIBW_ENVIRONMENT: CHARSET_NORMALIZER_USE_MYPYC='1'
CIBW_CONFIG_SETTINGS: "--no-isolation"
CIBW_BEFORE_BUILD: pip install -r dev-requirements.txt
CIBW_TEST_REQUIRES: pytest codecov pytest-cov
CIBW_TEST_COMMAND: pytest {package}/tests
CIBW_SKIP: pp*
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: dist
path: ./wheelhouse/*.whl

deploy:

runs-on: ubuntu-latest
needs:
- integration
- build-wheels

steps:
- uses: actions/checkout@v2
Expand All @@ -162,10 +247,17 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
- name: Download disctributions
uses: actions/download-artifact@v3
with:
name: dist
path: dist
- name: Collected dists
run: |
tree dist
- name: Publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
3 changes: 2 additions & 1 deletion .github/workflows/run-tests.yml
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build --no-isolation
pip install ./dist/*.whl
- name: Run tests
run: |
pytest
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Expand Up @@ -2,6 +2,48 @@
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)

### Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio

### Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter

### Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it

### Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'

## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)

### Added
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)

### Removed
- Breaking: Method `first()` and `best()` from CharsetMatch
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)

### Fixed
- Sphinx warnings when generating the documentation

## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)

### Changed
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

### Removed
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function `normalize`
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
- Support for the backport `unicodedata2`

## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)

### Deprecated
Expand Down

0 comments on commit 544595d

Please sign in to comment.