This is a project that I created to demonstrate dbt and BigQuery.
There are two separate data marts here, one is about COVID-19, and the other one is about liquor sales in Iowa (which is a little more fun).
These models create a table in BigQuery that joins together data from BigQuery's publicly available COVID-19 and Census datasets.
By combining these two sets of source data, we can derive new information. In this case, I calculated confirmed cases per 10,000 people in each US county.
This mart joins together Iowan liquor sales data with historical climate data.
The result is a model which shows the correlation between average temperature and triple sec sales. This supports the unsurprising hypothesis that Iowans drink more margaritas in warm months.
You'll need python
and pip
installed.
- install dbt:
pip install dbt
- follow the steps on this page to create a GCP project, enable BigQuery, and download a credentials file.
- follow the steps
here
to add the GCP credentials file to your
~/.dbt/
dir and to create a~/.dbt/profiles
file - update
profile_name
indbt_project.yml
to match the profile name you set in~/.dbt/profiles
- test the configuration
dbt debug
dbt run
This will create a tables and/or views in BigQuery corresponding to the defined models.
dbt test
# you'll see a few permission errors on bigquery-public-data. That's ok.
dbt docs generate
dbt docs serve
Your browser will automatically open up to a site describing the data models
- Visualize this geographic data on a map
- Add more attributes to data data model and use those as features in a machine learning model
- Apply dimensional modelling techniques
- Add more tests