Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 1.38 KB

README.md

File metadata and controls

62 lines (40 loc) · 1.38 KB

Movie Pepper Backend

Build Status Coverage Status

This repo contains all the backend code for the Movie Pepper open source recommendation engine.

This includes the REST API and the IMDb crawler.

Setup

Python 3, pip and virtualenv must be installed

Create a virtualenv

python3 -m venv venv

source venv/bin/activate

Install dependencies

pip install -r requirements.txt
python -m textblob.download_corpora
python -m nltk.downloader stopwords

Crawler

A Bash script is provided to simplify executing the Spidy crawler.

cd movie_scrape
START_URL="http://www.imdb.com/search/title?groups=top_1000&sort=user_rating,desc&page=1&ref" ./scrap.sh

After the crawl is complete calculate the TF-IDF values and Doc2Vec models.

python tfidf_lsa.py
python doc2vec.py

This step is needed to execute the server.

Server

Start the server

gunicorn --bind 0.0.0.0:5000 server:app

You will probably want to use a reverse proxy such as NGINX and secure it with HTTPS.

For developemnt you can use

python server.py