Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump spacy from 3.4.3 to 3.5.0 #882

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Jan 23, 2023

Bumps spacy from 3.4.3 to 3.5.0.

Release notes

Sourced from spacy's releases.

v3.5.0: New CLI commands, language updates, bug fixes and much more

✨ New features and improvements

  • NEW: New apply CLI command to annotate new documents with a trained pipeline (#11376).
  • NEW: New benchmark CLI command to benchmark pipelines. The new benchmark speed subcommand measures the speed of a pipeline, the benchmark accuracy subcommand is a new alias for evaluate (#11902).
  • NEW: New find-threshold CLI command to identify an optimal threshold for classification models (#11280).
  • NEW: New FUZZY Matcher operator for fuzzy matches based on Levenshtein edit distance. In addition, the FUZZY and REGEX operators are now supported in combination with IN/NOT_IN. (#11359).
  • Language updates for Ancient Greek, Dutch, Russian, Slovenian and Ukrainian (#11345, #11162, #11426, #11753, #11811, #11997, more details below).
  • Allow up to typer v0.7.x (#11720), mypy 0.990 (#11801) and typing_extensions v4.4.x (#12036).
  • New spacy.ConsoleLogger.v3 with expanded progress tracking (#11972).
  • Improved scoring behavior for textcat with spacy.textcat_scorer.v2 (#11696 and #11971) and spacy.textcat_multilabel_scorer.v2 (#11820).
  • Improved customizability of the knowledge base used for entity linking, with the default implementation being the new InMemoryLookupKB (#11268).
  • Optional before_update callback that is invoked at the start of each training step (#11739).
  • Improve performance of SpanGroup (#11380).
  • Improve UX around displacy.serve when the default port is in use (#11948).
  • Patch a security vulnerability in extracting tar files (#11746).
  • Add equality definition for vectors (#11806).
  • Allow interpolation of variables in directory names in projects (#11235).
  • Update default component configs to use the latest tok2vec version (#11618).

🔴 Bug fixes

  • #11382: Fix lookup behavior for the French and Catalan lemmatizers.
  • #11385: Ensure that downstream components can train properly on a frozen tok2vec or transformer layer.
  • #11762: Support local file system remotes for projects.
  • #11763: Raise an error when unsupported values are used for textcat.
  • #11834: Ensure Vocab.to_disk respects the exclude setting for lookups and vectors.
  • #12009: Fix a few typing issues for SpanGroup and Span objects.
  • #12098: Correctly handle missing annotations in the edit tree lemmatizer.

⚠️ Backwards incompatibilities and model updates

The following changes may require you to update code that is using the relevant functionality:

  • An error is now raised when unsupported values are given as input to train a textcat or textcat_multilabel model - ensure that values are 0.0 or 1.0 as explained in the docs.

The following changes may influence the output of your language pipeline or trained models:

  • Updates to language defaults:
    • Extended support for Slovenian (#11162).
    • Switch Russian and Ukrainian lemmatizers to pymorphy3 (#11345, #11811).
    • Support for editorial punctuation in Ancient Greek (#11426).
    • Update to Russian tokenizer exceptions (#11753).
    • Small fix in the list of Dutch stop words (#11997).
  • Updates to model defaults:
    • Use the latest tok2vec defaults in all components (#11618).
    • Improve the default attributes used for the textcat and textcat_multilabel components (#11698).
    • Update the default scorer for textcat and textcat_multilabel to fix a bug related to threshold for textcat and to make it possible to score multiple textcat/textcat_multilabel components in a single pipeline with custom scorers. If no custom scorers are used, the cat_p/r/f scores will now only reflect the final component's labels and performance (#11696, #11820).
    • Correct the token_acc score to report the intended measure (# correct tokens / # predicted tokens, the same as in spaCy v2). The token_acc scores for v3.5 will be lower for the same performance because they were incorrectly inflated in v3.0-v3.4. The token_p/r/f scores should remain unchanged (#12073).

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [spacy](https://github.com/explosion/spaCy) from 3.4.3 to 3.5.0.
- [Release notes](https://github.com/explosion/spaCy/releases)
- [Commits](explosion/spaCy@v3.4.3...v3.5.0)

---
updated-dependencies:
- dependency-name: spacy
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jan 23, 2023
@trafico-bot trafico-bot bot added the 🔍 Ready for Review Pull Request is not reviewed yet label Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code 🔍 Ready for Review Pull Request is not reviewed yet
Projects
None yet
0 participants