Skip to content

h0rv/collextion

Repository files navigation

collextion

Netlify Status

A website to display book recommendations from the Lex Fridman Podcast.

Final project for CS 1699 Practical AI, Spring 2023.

Install dependencies

cd src/
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg

Secrets

  cp src/example.env 
  cp src/.env 

Fill in required secrets in .env file

Download Pre-transcribed Transcripts

./download_transcripts.sh
# Conver data to text format (removing timing informatio from *.vtt file)
./convert_all.sh

Running the Backend

cd src/
./main.py

Running the site

npm install --prefix site/
npm start   --prefix site/

Running with Docker

docker run --name site --rm -it $(docker build -q .)

TODO

  • OpenAI Whisper integration
    • Add logic to be alerted of a new podcast post (likely from RSS feed)
  • Host on Google Cloud
    • Create Dockerfile
    • Automatic triggers and builds on pushes to main
    • Run container on GC
    • Run cron job to check for new podcast
  • Remove duplicate posts
  • Increase model accuracy
    • Look into a case-insensitve model that does not rely on capitalization (this is bottlenecked by Whisper)
    • "Capitalization normalization" did not work
  • Categorizing recommendations
    • Add genre information to each book
    • Create running lists of reccomended books. This will include a "reccomended_in" with each podcast it was mentioned
  • Setup env for API keys