Skip to content

This is a small project where I used Airflow DAG program to create a table in MySQL database and insert values into it. I also created a pipeline using spark to fetch data from MySQL database, process it and store the processed data into Snowflake data warehouse. Here I used Apache Airflow to automate this process

Notifications You must be signed in to change notification settings

yosh0555/airflow_with_mysql_and_snowflake

Repository files navigation

Installation

  1. Run the installer.sh file
sh installer.sh
  1. Now extract the spark tar file
tar -xvf spark-3.3.0-bin-hadoop3.tgz
  1. Move the spark directory to /usr/bin/
sudo mv spark-3.3.0-bin-hadoop3 /usr/bin/
  1. Setup environment variables for spark in .bashrc file
vi ~/.bashrc
  1. Type the following commands in the .bashrc file
export SPARK_HOME=/usr/bin/spark-3.3.0-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
  1. Use this command to apply the changes in .bashrc file
source ~/.bashrc

Setting pyspark python version to 3.8

  1. Use the following command to check which python version being used by pyspark
pyspark
  1. If the version is not 3.8, open the .bashrc file with an editor
vi ~/.bashrc
  1. Type in the following command to setup environment variable for python used by pyspark
export PYSPARK_PYTHON=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3.8
  1. Use the following command to apply the changes in .bashrc file
source ~/.bashrc

Starting the airflow server and logging into the webUI

  1. Use the following command to start the airflow server
airflow standalone
  1. Copy the password that is generated in the terminal

Copy the password from the terminal

  1. Go to the browser and type the following url
YourAirflowHostName:8080
  1. Login to Airflow by entering the Username and Password which was generated in the terminal

Enter the credentials to login into the Airflow WebUI

About

This is a small project where I used Airflow DAG program to create a table in MySQL database and insert values into it. I also created a pipeline using spark to fetch data from MySQL database, process it and store the processed data into Snowflake data warehouse. Here I used Apache Airflow to automate this process

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published