Movie Pepper Backend

This repo contains all the backend code for the Movie Pepper open source recommendation engine.

This includes the REST API and the IMDb crawler.

Setup

Python 3, pip and virtualenv must be installed

Create a virtualenv

python3 -m venv venv

source venv/bin/activate

Install dependencies

pip install -r requirements.txt
python -m textblob.download_corpora
python -m nltk.downloader stopwords

A Bash script is provided to simplify executing the Spidy crawler.

cd movie_scrape
START_URL="http://www.imdb.com/search/title?groups=top_1000&sort=user_rating,desc&page=1&ref" ./scrap.sh

After the crawl is complete calculate the TF-IDF values and Doc2Vec models.

python tfidf_lsa.py
python doc2vec.py

This step is needed to execute the server.

Start the server

gunicorn --bind 0.0.0.0:5000 server:app

You will probably want to use a reverse proxy such as NGINX and secure it with HTTPS.

For developemnt you can use

python server.py