explosion · adrianeboyd · Jun 16, 2020 · May 19, 2020 · May 21, 2020 · May 21, 2020
diff --git a/README.md b/README.md
@@ -6,12 +6,12 @@ spaCy is a library for advanced Natural Language Processing in Python and
 Cython. It's built on the very latest research, and was designed from day one to
 be used in real products. spaCy comes with
 [pretrained statistical models](https://spacy.io/models) and word vectors, and
-currently supports tokenization for **50+ languages**. It features
+currently supports tokenization for **60+ languages**. It features
 state-of-the-art speed, convolutional **neural network models** for tagging,
 parsing and **named entity recognition** and easy **deep learning** integration.
 It's commercial open-source software, released under the MIT license.
 
-💫 **Version 2.2 out now!**
+💫 **Version 2.3 out now!**
 [Check out the release notes here.](https://github.com/explosion/spaCy/releases)
 
 [![Azure Pipelines](<https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build+(3.x)>)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
@@ -32,15 +32,15 @@ It's commercial open-source software, released under the MIT license.
 | --------------- | -------------------------------------------------------------- |
 | [spaCy 101]     | New to spaCy? Here's everything you need to know!              |
 | [Usage Guides]  | How to use spaCy and its features.                             |
-| [New in v2.2]   | New features, backwards incompatibilities and migration guide. |
+| [New in v2.3]   | New features, backwards incompatibilities and migration guide. |
 | [API Reference] | The detailed reference for spaCy's API.                        |
 | [Models]        | Download statistical language models for spaCy.                |
 | [Universe]      | Libraries, extensions, demos, books and courses.               |
 | [Changelog]     | Changes and version history.                                   |
 | [Contribute]    | How to contribute to the spaCy project and code base.          |
 
 [spacy 101]: https://spacy.io/usage/spacy-101
-[new in v2.2]: https://spacy.io/usage/v2-2
+[new in v2.3]: https://spacy.io/usage/v2-3
 [usage guides]: https://spacy.io/usage/
 [api reference]: https://spacy.io/api/
 [models]: https://spacy.io/models
@@ -113,12 +113,13 @@ of `v2.0.13`).
 pip install spacy
 ```
 
-To install additional data tables for lemmatization in **spaCy v2.2+** you can
-run `pip install spacy[lookups]` or install
+To install additional data tables for lemmatization and normalization in
+**spaCy v2.2+** you can run `pip install spacy[lookups]` or install
 [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data)
 separately. The lookups package is needed to create blank models with
-lemmatization data, and to lemmatize in languages that don't yet come with
-pretrained models and aren't powered by third-party libraries.
+lemmatization data for v2.2+ plus normalization data for v2.3+, and to
+lemmatize in languages that don't yet come with pretrained models and aren't
+powered by third-party libraries.
 
 When using pip it is generally recommended to install packages in a virtual
 environment to avoid modifying system state:

diff --git a/website/docs/api/cli.md b/website/docs/api/cli.md
@@ -541,16 +541,17 @@ $ python -m spacy init-model [lang] [output_dir] [--jsonl-loc] [--vectors-loc]
 [--prune-vectors]
 ```
 
-| Argument                                                | Type       | Description                                                                                                                                                                                                                                            |
-| ------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `lang`                                                  | positional | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), e.g. `en`.                                                                                                                                                           |
-| `output_dir`                                            | positional | Model output directory. Will be created if it doesn't exist.                                                                                                                                                                                           |
-| `--jsonl-loc`, `-j`                                     | option     | Optional location of JSONL-formatted [vocabulary file](/api/annotation#vocab-jsonl) with lexical attributes.                                                                                                                                           |
-| `--vectors-loc`, `-v`                                   | option     | Optional location of vectors. Should be a file where the first row contains the dimensions of the vectors, followed by a space-separated Word2Vec table. File can be provided in `.txt` format or as a zipped text file in `.zip` or `.tar.gz` format. |
-| `--truncate-vectors`, `-t` <Tag variant="new">2.3</Tag> | option     | Number of vectors to truncate to when reading in vectors file. Defaults to `0` for no truncation.                                                                                                                                                      |
-| `--prune-vectors`, `-V`                                 | option     | Number of vectors to prune the vocabulary to. Defaults to `-1` for no pruning.                                                                                                                                                                         |
-| `--vectors-name`, `-vn`                                 | option     | Name to assign to the word vectors in the `meta.json`, e.g. `en_core_web_md.vectors`.                                                                                                                                                                  |
-| **CREATES**                                             | model      | A spaCy model containing the vocab and vectors.                                                                                                                                                                                                        |
+| Argument                                                    | Type       | Description                                                                                                                                                                                                                                            |
+| ----------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `lang`                                                      | positional | Model language [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), e.g. `en`.                                                                                                                                                           |
+| `output_dir`                                                | positional | Model output directory. Will be created if it doesn't exist.                                                                                                                                                                                           |
+| `--jsonl-loc`, `-j`                                         | option     | Optional location of JSONL-formatted [vocabulary file](/api/annotation#vocab-jsonl) with lexical attributes.                                                                                                                                           |
+| `--vectors-loc`, `-v`                                       | option     | Optional location of vectors. Should be a file where the first row contains the dimensions of the vectors, followed by a space-separated Word2Vec table. File can be provided in `.txt` format or as a zipped text file in `.zip` or `.tar.gz` format. |
+| `--truncate-vectors`, `-t` <Tag variant="new">2.3</Tag>     | option     | Number of vectors to truncate to when reading in vectors file. Defaults to `0` for no truncation.                                                                                                                                                      |
+| `--prune-vectors`, `-V`                                     | option     | Number of vectors to prune the vocabulary to. Defaults to `-1` for no pruning.                                                                                                                                                                         |
+| `--vectors-name`, `-vn`                                     | option     | Name to assign to the word vectors in the `meta.json`, e.g. `en_core_web_md.vectors`.                                                                                                                                                                  |
+| `--omit-extra-lookups`, `-OEL` <Tag variant="new">2.3</Tag> | flag       | Do not include any of the extra lookups tables (`cluster`/`prob`/`sentiment`) from `spacy-lookups-data` in the model.                                                                                                                                  |
+| **CREATES**                                                 | model      | A spaCy model containing the vocab and vectors.                                                                                                                                                                                                        |
 
 ## Evaluate {#evaluate new="2"}
 

diff --git a/website/docs/api/cython-structs.md b/website/docs/api/cython-structs.md
@@ -171,9 +171,6 @@ struct.
 | `shape`     | <Abbr title="uint64_t">`attr_t`</Abbr>  | Transform of the lexeme's string, to show orthographic features.                                                           |
 | `prefix`    | <Abbr title="uint64_t">`attr_t`</Abbr>  | Length-N substring from the start of the lexeme. Defaults to `N=1`.                                                        |
 | `suffix`    | <Abbr title="uint64_t">`attr_t`</Abbr>  | Length-N substring from the end of the lexeme. Defaults to `N=3`.                                                          |
-| `cluster`   | <Abbr title="uint64_t">`attr_t`</Abbr>  | Brown cluster ID.                                                                                                          |
-| `prob`      | `float`                                 | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary).                 |
-| `sentiment` | `float`                                 | A scalar value indicating positivity or negativity.                                                                        |
 
 ### Lexeme.get_struct_attr {#lexeme_get_struct_attr tag="staticmethod, nogil" source="spacy/lexeme.pxd"}
 

diff --git a/website/docs/api/goldparse.md b/website/docs/api/goldparse.md
@@ -22,6 +22,7 @@ missing – the gradient for those labels will be zero.
 | `entities`  | iterable    | A sequence of named entity annotations, either as BILUO tag strings, or as `(start_char, end_char, label)` tuples, representing the entity positions. If BILUO tag strings, you can specify missing values by setting the tag to None. |
 | `cats`      | dict        | Labels for text classification. Each key in the dictionary is a string label for the category and each value is `1.0` (positive) or `0.0` (negative).                                                                                  |
 | `links`     | dict        | Labels for entity linking. A dict with `(start_char, end_char)` keys, and the values being dicts with `kb_id:value` entries, representing external KB IDs mapped to either `1.0` (positive) or `0.0` (negative).                       |
+| `make_projective` | bool  | Whether to projectivize the dependency tree. Defaults to `False.`.                                                                                     |
 | **RETURNS** | `GoldParse` | The newly constructed object.                                                                                                                                                                                                          |
 
 ## GoldParse.\_\_len\_\_ {#len tag="method"}

diff --git a/website/docs/api/lexeme.md b/website/docs/api/lexeme.md
@@ -156,7 +156,7 @@ The L2 norm of the lexeme's vector representation.
 | `like_url`                                   | bool    | Does the lexeme resemble a URL?                                                                                                                                                                                                                              |
 | `like_num`                                   | bool    | Does the lexeme represent a number? e.g. "10.9", "10", "ten", etc.                                                                                                                                                                                           |
 | `like_email`                                 | bool    | Does the lexeme resemble an email address?                                                                                                                                                                                                                   |
-| `is_oov`                                     | bool    | Is the lexeme out-of-vocabulary?                                                                                                                                                                                                                             |
+| `is_oov`                                     | bool    | Does the lexeme have a word vector?                                                                                                                                                                                                                          |
 | `is_stop`                                    | bool    | Is the lexeme part of a "stop list"?                                                                                                                                                                                                                         |
 | `lang`                                       | int     | Language of the parent vocabulary.                                                                                                                                                                                                                           |
 | `lang_`                                      | unicode | Language of the parent vocabulary.                                                                                                                                                                                                                           |

diff --git a/website/docs/api/matcher.md b/website/docs/api/matcher.md
@@ -40,7 +40,8 @@ string where an integer is expected) or unexpected property names.
 
 ## Matcher.\_\_call\_\_ {#call tag="method"}
 
-Find all token sequences matching the supplied patterns on the `Doc`.
+Find all token sequences matching the supplied patterns on the `Doc`. As of
+spaCy v2.3, the `Matcher` can also be called on `Span` objects.
 
 > #### Example
 >
@@ -54,10 +55,10 @@ Find all token sequences matching the supplied patterns on the `Doc`.
 > matches = matcher(doc)
 > ```
 
-| Name        | Type  | Description                                                                                                                                                              |
-| ----------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `doc`       | `Doc` | The document to match over.                                                                                                                                              |
-| **RETURNS** | list  | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. The `match_id` is the ID of the added match pattern. |
+| Name        | Type         | Description                                                                                                                                                              |
+| ----------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `doclike`   | `Doc`/`Span` | The document to match over or a `Span` (as of v2.3)..                                                                                                                    |
+| **RETURNS** | list         | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end`]. The `match_id` is the ID of the added match pattern. |
 
 <Infobox title="Important note" variant="warning">
 

diff --git a/website/docs/api/sentencizer.md b/website/docs/api/sentencizer.md
@@ -42,7 +42,7 @@ Initialize the sentencizer.
 
 | Name          | Type          | Description                                                                                            |
 | ------------- | ------------- | ------------------------------------------------------------------------------------------------------ |
-| `punct_chars` | list          | Optional custom list of punctuation characters that mark sentence ends. Defaults to `[".", "!", "?"].` |
+| `punct_chars` | list          | Optional custom list of punctuation characters that mark sentence ends. Defaults to `['!', '.', '?', '։', '؟', '۔', '܀', '܁', '܂', '߹', '।', '॥', '၊', '။', '።', '፧', '፨', '᙮', '᜵', '᜶', '᠃', '᠉', '᥄', '᥅', '᪨', '᪩', '᪪', '᪫', '᭚', '᭛', '᭞', '᭟', '᰻', '᰼', '᱾', '᱿', '‼', '‽', '⁇', '⁈', '⁉', '⸮', '⸼', '꓿', '꘎', '꘏', '꛳', '꛷', '꡶', '꡷', '꣎', '꣏', '꤯', '꧈', '꧉', '꩝', '꩞', '꩟', '꫰', '꫱', '꯫', '﹒', '﹖', '﹗', '！', '．', '？', '𐩖', '𐩗', '𑁇', '𑁈', '𑂾', '𑂿', '𑃀', '𑃁', '𑅁', '𑅂', '𑅃', '𑇅', '𑇆', '𑇍', '𑇞', '𑇟', '𑈸', '𑈹', '𑈻', '𑈼', '𑊩', '𑑋', '𑑌', '𑗂', '𑗃', '𑗉', '𑗊', '𑗋', '𑗌', '𑗍', '𑗎', '𑗏', '𑗐', '𑗑', '𑗒', '𑗓', '𑗔', '𑗕', '𑗖', '𑗗', '𑙁', '𑙂', '𑜼', '𑜽', '𑜾', '𑩂', '𑩃', '𑪛', '𑪜', '𑱁', '𑱂', '𖩮', '𖩯', '𖫵', '𖬷', '𖬸', '𖭄', '𛲟', '𝪈', '｡', '。']`. |
 | **RETURNS**   | `Sentencizer` | The newly constructed object.                                                                          |
 
 ## Sentencizer.\_\_call\_\_ {#call tag="method"}