Python-AdaGram is an implementation of AdaGram (adaptive skip-gram) for Python. It borrows a lot of C code from the original AdaGram implementation in Julia (https://github.com/sbos/AdaGram.jl). AdaGram was introduced in a paper by Sergey Bartunov, Dmitry Kondrashkin, Anton Osokin and Dmitry Vetrov at http://arxiv.org/abs/1502.07257.
Note: this is a work in progress: it lacks tests, and training is not working correctly yet. But it can already load AdaGram.jl models, perform disambiguation, search for nearest neighbours, etc. If you have a more mature implementation or want to help, please get in touch.
The package is not on PyPI yet, please install it from source in the meantime:
$ pip install Cython numpy
$ pip install git+https://github.com/lopuhin/python-adagram.git
Train a model from command line:
$ adagram-train tokenized.txt out.pkl
Input corpus must be already tokenized, with tokens (usually words) separated by spaces. There are many options available, see adagram-train --help
.
Load model:
>>> import adagram
>>> vm = adagram.VectorModel.load('out.pkl')
Get sense probabilities for some word:
>>> vm.word_sense_probs('apple')
[0.341832, 0.658164]
Get sense neighbors:
>>> vm.sense_neighbors('apple', 0)
[('almond', 0, 0.70396507),
('cherry', 1, 0.69193166),
('plum', 0, 0.690269),
('apricot', 0, 0.6882005),
('orange', 3, 0.6739181),
('pecan', 0, 0.6662803),
('pomegranate', 0, 0.6580653)
('blueberry', 0, 0.6509351),
('pear', 0, 0.6484747),
('peach', 0, 0.6313036)]
>>> vm.sense_neighbors('apple', 1)
[('macintosh', 0, 0.79053026),
('iifx', 0, 0.71349466),
('iigs', 0, 0.7030192),
('computers', 0, 0.6952761),
('kaypro', 0, 0.6938647),
('ipad', 0, 0.6914306),
('pc', 3, 0.6801078),
('ibm', 0, 0.66797054),
('powerpc-based', 0, 0.66319686),
('ibm-compatible', 0, 0.66120595)]
Get sense vector:
>>> vm.sense_vector('apple', 1)
array([...], dtype=float32)
First, install AdaGram.jl as described here https://github.com/sbos/AdaGram.jl. Install JSON package:
$ julia
julia> Pkg.add("JSON")
Run the script that converts a julia model to JSON:
$ julia adagram/dump_julia.jl julia-model out-directory
This will save two JSON files to out-directory
.
Next, to convert model to python format, run:
$ ./adagram/load_julia.py out-directory model.joblib