GitHub - tmsimont/wombat: Fast implementation of Word2Vec

Note that in its current state the code is in a bit of a mess. A lot of remnants of some related expiriments are left in the code... Substantial refactoring is needed.

I've started refactoring heavily in the v2 branch, but it's currently not actually working/building in v2. Please post issues on github and I'll try to take a look, but my time on this is very limited :(

word matrix batches

This code was developed as part of my Master's thesis research.

A paper is available that describes the methods in this package on IEEE:
Efficient and accurate Word2Vec implementations in GPU and shared-memory multicore architectures

The work builds upon ideas presented in BIDMach and further refined in Intel's pWord2Vec.

This code supports:

Both CPU and GPU matrix-based fast Word2Vec
Both SkipGram and Hierarchical Softmax Word2Vec architectures

This code does not support:

Distributed computing techniques (see pWord2Vec)
CBOW Word2Vec architectures

Installation

The make file (hackishly) supports g++, CUDA or ICPC.

Different source files are used for different compilers.

To compile, use make:

For g++:

make

For CUDA:

make cuda

For MKL support and ICPC:

make intel

Once made, you can use the scripts in /scripts to run test programs:

Testing g++ or icpc compiled program:

./cpu.sh [num threads]

Testing CUDA (requries 6.0 CUDA capability):

./cuda.numCPUT-batchSize-batchesPerT.sh  [num cpu threads] [batch size] [batches per thread]

For all programs, to get test data:

./get-data.sh

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
scripts		scripts
src		src
README.md		README.md
makefile		makefile
wombat.png		wombat.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

src

src

README.md

README.md

makefile

makefile

wombat.png

wombat.png

Repository files navigation

Installation

About

Releases

Packages

Contributors 2

Languages

tmsimont/wombat

Folders and files

Latest commit

History

Repository files navigation

Installation

About

Resources

Stars

Watchers

Forks

Languages