Topic-Modeling-Book-Descriptions

This is an LDA Topic Model trained with a book descriptions dataset. The frontend allows entry of a book description to predict its topic.

We used the popular topic modeling technique LDA. If you want to learn more about it, visit https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation.

Following is a graph that shows log likelyhood score for models with different parameters. We chose 15 topics with a learning decay of 0.6 to maintain a healthy number of topics without sacrificing topic coherence.

The dataset used in this project can be acquired from https://www.kaggle.com/datasets/dylanjcastillo/7k-books-with-metadata

Installation (may vary based on OS)

Clone this repository
Create a Python virtual environment and activate it

python3 -m venv newvenv

cd to backend/services/model directory
Install python dependencies: pip3 install -r requirements.txt
Train the model by running trainModel.py: python trainModel.py
Set BENTOML_CONFIG environment variable

// Windows:
set BENTOML_CONFIG=./config.yaml
// Linux or Unix:
export BENTOML_CONFIG=./config.yaml

Start bentoML dev frontend to test post requests: bentoml serve service.py
Open http://localhost:3001 to see bentoML swagger UI.

Frontend

Change into the frontend folder in a different CLI and install node dependencies: npm install
Start frontend using npm start
Open http://localhost:3000 to see frontend simple webapp

Demo

We need to provide the model with a book description. Let's choose description of a book that came out recently. For this demo, we will use the book Tomorrow, and Tomorrow, and Tomorrow by Gabrielle Zevin. The description is copied from here: https://www.goodreads.com/book/show/58784475-tomorrow-and-tomorrow-and-tomorrow
Clicking Get Topics results in the following output: The book we used in the demo is labeled with the these genres: Fiction, Contemporary, Romance, Audiobook, Literary Fiction, Historical Fiction, Adult. Based on the output, topics 2, 10, 13 and 4 have the highest frequency. Looking at the words for these high frequency topics, we can infer that the model accurately predicts the topic of the book from it's description.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
backend/services/model		backend/services/model
demo		demo
frontend		frontend
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend/services/model

backend/services/model

demo

demo

frontend

frontend

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

Topic-Modeling-Book-Descriptions

Installation (may vary based on OS)

Frontend

Demo

Demo Gif

About

Releases

Packages

Contributors 3

Languages

License

Proto007/Topic-Modeling-Book-Descriptions

Folders and files

Latest commit

History

Repository files navigation

Topic-Modeling-Book-Descriptions

Installation (may vary based on OS)

Frontend

Demo

Demo Gif

About

Topics

Resources

License

Stars

Watchers

Forks

Languages