Skip to content

First assignment for the course of Big Data Platform held at Aalto University (Helsinki, Finland), fall 2019.

Notifications You must be signed in to change notification settings

MMirelli/bdp_assignment_1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BDP - Assignment 1

Overview

In this assignment we simulate ingestion of the dataset 2018 Yellow Taxi Trip Data into a cassandra cluster, (mysimbdp-coredms). The application manages to serve several users at a time (we have tested it with up to 500 concurrent users), with no data loss, thanks to the RabbitMQ cluster employed.

Here is the platform design:


Project Structure

code
├── client.py
├── db
│   ├── docker-compose.yml
│   └── setuppers
│       ├── Dockerfile
│       └── setup_db.py
├── install_dependencies.sh
├── queue
│   ├── consumer.py
│   ├── docker-compose.yml
│   └── Dockerfile
├── README.md
├── run.sh
├── start_db.sh
├── start_queue.sh
└── utils
    ├── Dockerfile
    └── split_data.py

The deploy uses docker-compose for starting up 2 cassandra-db and 2 bitnami/rabbitmq containerised nodes. Other two components use containers as well:

both are pre-built on top of a docker image of python3 with cassandra-python as a dependency (in code/utils/Dockerfile), which I pushed to my docker repository so that it is automatically downloaded when composing.

The main building program is started by run, which triggers the composition of the docker-compose files (code/db/docker-compose.yml and code/queue/docker-compose.yml) and then a number of clients (client.py) specified by the user.

Logs can be found in logs with each file following the format: <log_type>_<client_number>.log, where log_type is one of db, queue, client and client_number is the number of concurrent clients run for the ingestion.


About

First assignment for the course of Big Data Platform held at Aalto University (Helsinki, Finland), fall 2019.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published