Skip to content

nikolasrieble/airflow_on_heroku

Repository files navigation

airflow_on_heroku

This repository contains code to deploy Apache Airflow on heroku. In airflow, multiple jobs (DAGS) are used to scrape newspapers using the Python package Newspaper3k and insert them into a MongoDB.

Initial setup

Step 1 is only required if you wish to run the scraping dags. If you instead prefer to run your own dags, start from step 2

  1. OPTIONAL Create an account here MongoDB

    1. Create a user with Read/Write permissions

    2. Generate the connection string for this user

    3. Add the connection to the heroku_setup.sh here:

      heroku config:set MONGO_DB= "HERE ADD YOUR MONGO DB CONNECTION STRING"

  2. Register an account on https://www.heroku.com/

  3. Login to heroku via terminal heroku login

  4. Configure and deploy airflow bash heroku_setup.sh

  5. Open heroku open

  6. Change the user pw

Updating the instance with new dags

  1. Implement your dags in the dags folder
  2. Push your changes to master
  3. git push heroku master or git push heroku subbranchname:master

As always, we did not reinvent the wheel, but benefited from multiple source out of which we can remember the following:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published