GCP Data Engineering project

The purpose of this project is to analyze the TLC Trip Record Data. Trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. To understand the data and its fields, please refer to the data dictionary provided by the TLC (data_dictionary_trip_records_yellow.pdf).

The analysis of the TLC Trip Record Data will be conducted using the following technologies and tools:

Cloud Storage 🪣 to store and manage the trip record data
Compute Engine 💽 to host and run the Mage
Mage 🧙‍♂️ for Extract, Transform, Load (ETL) processes
BigQuery 🔍 as our data warehouse for storing and querying the transformed trip record data
Looker📈 as our business intelligence and data visualization platform

💡 Additionally, a key objective of this project is to gain familiarity with the Mage tool. Mage offers an alternative approach to Airflow for managing ETL workflows and will be the primary tool used to orchestrate the data pipeline.

GCS 🪣

Create a new bucket, upload the csv file. Change the permissions of the bucket (permissions -> edit access control -> Fine-grained), edit the access of the csv file to make it publicly available

Compute Engine 💽

Create a new instance, e.g. Machine type: e2-standard-4

SSH into your VM and run these commands to install Python, pip, wget, pandas, Google Cloud Library, Google Cloud BigQuery library

sudo apt-get install update

sudo apt-get install python3-distutils

sudo apt-get install python3-apt

sudo apt-get install wget

wget https://bootstrap.pypa.io/get-pip.py

sudo python3 get-pip.py

sudo pip3 install pandas

sudo pip3 install google-cloud

sudo pip3 install google-cloud-bigquery

Run sudo pip3 install mage-ai to install Mage on your VM.

To start a new Mage project: mage start de-uber-project (you will see: Checking port 6789... if you restart Mage, the port may change e.g. Checking port 6790...)

Create a new firewall rule for the port 6789

To open Mage UI: External-IP-address:6789

Mage 🧙

load_uber_data

Open Mage UI, select Data loader -> Python -> API

Copy URL of your csv file, add it to the Mage code: @data_loader url = ' '

Run the block, load the data from GCS

uber_transformation

Transform the data: select Transformer -> Python -> Generic (no template)

Run uber_transformation block

uber_bigquery_load

Select Data exporter -> Python -> Google BigQuery

Go to API & Services on GCP, create a new Service account from the Credentials section, assign the BigQuery Admin role to the SA. Create a new key in JSON format for this SA. Copy and paste the information from your JSON key into the io_config.yaml file.

BigQuery 🔍

Create a Dataset in BQ, run uber_bigquery_load block. Load all tables to BQ.

Create a new table by running sql code from sql_query.sql

Looker 📈

Open lookerstudio.google.com. Connect Looker to your BQ.

Create a dashboard. Here is an example of my Looker Dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Data_model_diagram.jpeg		Data_model_diagram.jpeg
GCP_diagram.jpg		GCP_diagram.jpg
README.md		README.md
Uber-Data-Analytics.ipynb		Uber-Data-Analytics.ipynb
Uber_Dashboard.pdf		Uber_Dashboard.pdf
data_dictionary_trip_records_yellow.pdf		data_dictionary_trip_records_yellow.pdf
mage-load-uber-data.py		mage-load-uber-data.py
mage-uber-bigquery-load.py		mage-uber-bigquery-load.py
mage-uber-transformation.py		mage-uber-transformation.py
sql_query.sql		sql_query.sql
uber_data.csv		uber_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data_model_diagram.jpeg

Data_model_diagram.jpeg

GCP_diagram.jpg

GCP_diagram.jpg

README.md

README.md

Uber-Data-Analytics.ipynb

Uber-Data-Analytics.ipynb

Uber_Dashboard.pdf

Uber_Dashboard.pdf

data_dictionary_trip_records_yellow.pdf

data_dictionary_trip_records_yellow.pdf

mage-load-uber-data.py

mage-load-uber-data.py

mage-uber-bigquery-load.py

mage-uber-bigquery-load.py

mage-uber-transformation.py

mage-uber-transformation.py

sql_query.sql

sql_query.sql

uber_data.csv

uber_data.csv

Repository files navigation

GCP Data Engineering project

GCS 🪣

Compute Engine 💽

Mage 🧙

load_uber_data

uber_transformation

uber_bigquery_load

BigQuery 🔍

Looker 📈

About

Releases

Packages

Languages

janaom/GCP-DE-project-uber-etl-pipeline

Folders and files

Latest commit

History

Repository files navigation

GCP Data Engineering project

GCS 🪣

Compute Engine 💽

Mage 🧙

load_uber_data

uber_transformation

uber_bigquery_load

BigQuery 🔍

Looker 📈

About

Topics

Resources

Stars

Watchers

Forks

Languages