Skip to content

Releases: explosion/spacy-vectors-builder

Dutch vectors for DH2023

04 May 07:45
Compare
Choose a tag to compare
Pre-release
nl-dh2023-v0.0.1

Initial nl config for DH2023 demo

English vectors for spaCy v3.4

14 Jul 11:07
Compare
Choose a tag to compare
Pre-release

English vectors trained for spaCy v3.4.0 using floret.

The en_vectors_fasttext vectors were trained with floret in fasttext mode and are the same vectors as in en_core_web_lg v3.4.0.

The floret vectors are trained in floret mode on the same data with 50K entries (md) and 200K entries (lg).

Note that the .bin files are only compatible with floret, not fasttext. Load with the command-line floret or the python module:

import floret
model = floret.load_model("en_vectors_floret_md.bin")
model.get_subwords("covid")
# (['<covid>', '<covi', 'covid', 'ovid>'], array([517646, 541731, 558180, 540981, 527325, 538060, 559280, 538021]))
model.get_nearest_neighbors("covid")
# [(0.70456463098526, 'Covid'), (0.6891582012176514, 'COVID'), (0.6806262135505676, 'covid-19'), (0.607974648475647, 'Covid-19'), (0.5875810384750366, 'COVID-19'), (0.5560713410377502, 'covid19'), (0.5450572371482849, 'coronavirus'), (0.5238808393478394, 'Covid19'), (0.5168178081512451, 'pandemic'), (0.5062406659126282, 'Coronavirus')]