Skip to content

zz-xx/reddit_word_2_vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reddit_word_2_vec

Applying Word 2 Vec Algorithm on Reddit comments

My take on word 2 vec algorithm implementation from Medium post.

Stumbled upon the post while trying to implement Word 2 Vec algorithm for Reddit comments. Implementation is almost same. Highly advised to read the blog post and then come back to this.

All the credit for original implementation goes to https://github.com/ravishchawla/word_2_vec.

Dataset can be downloaded from Kaggle here. After downloading the dataset, create a new folder called "dataset" in directory of project and extract the downloaded dataset into this folder.

What is different ?

Mainly wrote this code in midst of learning process. Original excercise is completely in IPython notebook while this is to be executed as a script.

Since dataset is very large (~30 GB) and original excercise was done on AWS P4.2xLarge instance, with 60 GB RAM, some changes were made to make this run of normal PC's albeit with variable(preferably lesser) number of comments.

Code refactor and flask based web interface for changing various parameters and observe effect on ouput. Screenshots can be seen below.

Front end templates taken from Colorlib.