Skip to content

sp1thas/criticker-dataset

Repository files navigation

criticker-dataset Dataset

pre-commit.ci status testing python-version kaggle-dataset great_expectations

Yet another dataset about Movies, TV Shows and Games.

This is implementation of Criticker Dataset. This repository contains the necessesary spiders for dataset creation alongside with some basic tests.

great_expectations tool is used for Data Quality purposes, check here the datadocs

poetry module is used for virtual environment and dependency management

Install dependencies

poetry install

Create dataset from scratch

poetry run scrapy crawl games_spider -o data/raw/games.csv # to retrieve games
# export login username and password
export C_USERNAME='<USERNAME>'
export C_PASSWORD='<PASSWORD>'
poetry run scrapy crawl movies_spider -o data/raw/movies.csv # to retrieve movies

Development

Run tests

poetry run pytest

Next steps

  • Add games
  • TCI related data
  • Add reviews