Skip to content
This repository has been archived by the owner on Sep 27, 2022. It is now read-only.

The idea is how can we prepare data to be used by Business Intelligence applications like Tableu or even Jupyternotebook! 👍 In order to help the business see an overview of the data in a diagram of what important features of the product their customers might be using. Mainly, how can we improve the performance of these OLAP and OLTP transactions…

Notifications You must be signed in to change notification settings

idelfonsog2/data_warehouse_with_aws_redshift

Repository files navigation

Scenario

A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud. Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

Assignment

Build an ETL pipeline that extracts JSON log data from an S3 bucket, for this I'm using my local machine as the server to load that data into create my tables and run queries then create my staging tables After I stage the aggregated data into Redshift cluster for any BI to utilize it.

Set up

In the dwh.cfg file you will need to create and grant access to your AWS IAM account for the use of this application. This will allow boto3 python package to do the setup of the inital infrastructure in your AWS account.

About

The idea is how can we prepare data to be used by Business Intelligence applications like Tableu or even Jupyternotebook! 👍 In order to help the business see an overview of the data in a diagram of what important features of the product their customers might be using. Mainly, how can we improve the performance of these OLAP and OLTP transactions…

Topics

Resources

Stars

Watchers

Forks