This quick-start guide will allow you to quickly start Airflow with CeleryExecutor </executor/celery>
in Docker. This is the fastest way to start Airflow.
Follow these steps to install the necessary tools.
- Install Docker Community Edition (CE) on your workstation. Depending on the OS, you may need to configure your Docker instance to use 4.00 GB of memory for all containers to run properly. Please refer to the Resources section if using Docker for Windows or Docker for Mac for more information.
- Install Docker Compose v1.29.1 and newer on your workstation.
Older versions of docker-compose
do not support all features required by docker-compose.yaml
file, so double check that it meets the minimum version requirements.
Warning
Default amount of memory available for Docker on MacOS is often not enough to get Airflow up and running. If you have not enough memory available it might lead to airflow webserver continuously restarting. You should have at least 4GB memory allocated for the Docker Engine (ideally 8GB). You can check and change the amount of memory in Resources
You can also check if you have enough memory by running this command:
docker run --rm "debian:buster-slim" bash -c 'numfmt --to iec $(echo $(($(getconf _PHYS_PAGES) * $(getconf PAGE_SIZE))))'
To deploy Airflow on Docker Compose, you should fetch docker-compose.yaml.
quick_start_ctx
curl -LfO '{{ doc_root_url }}docker-compose.yaml'
This file contains several service definitions:
airflow-scheduler
- Thescheduler </concepts/scheduler>
monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.airflow-webserver
- The webserver available athttp://localhost:8080
.airflow-worker
- The worker that executes the tasks given by the scheduler.airflow-init
- The initialization service.flower
- The flower app for monitoring the environment. It is available athttp://localhost:5555
.postgres
- The database.redis
- The redis - broker that forwards messages from scheduler to worker.
All these services allow you to run Airflow with CeleryExecutor </executor/celery>
. For more information, see /concepts/overview
.
Some directories in the container are mounted, which means that their contents are synchronized between your computer and the container.
./dags
- you can put your DAG files here../logs
- contains logs from task execution and scheduler../plugins
- you can put yourcustom plugins </plugins>
here.
This file uses the latest Airflow image (apache/airflow). If you need install a new Python library or system library, you can build your image <docker-stack:index>
.
Before starting Airflow for the first time, You need to prepare your environment, i.e. create the necessary files, directories and initialize the database.
On Linux, the mounted volumes in container use the native Linux filesystem user/group permissions, so you have to make sure the container and host computer have matching file permissions.
mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
SeeDocker Compose environment variables <docker-compose-env-variables>
On all operating systems, you need to run database migrations and create the first user account. To do it, run.
docker-compose up airflow-init
After initialization is complete, you should see a message like below.
airflow-init_1 | Upgrades done airflow-init_1 | Admin user airflow created airflow-init_1 | start_airflow-init_1 exited with code 0
The account created has the login airflow
and the password airflow
.
Now you can start all services:
docker-compose up
In the second terminal you can check the condition of the containers and make sure that no containers are in unhealthy condition:
$ docker ps
CONTAINER ID IMAGE |version-spacepad| COMMAND CREATED STATUS PORTS NAMES
247ebe6cf87a apache/airflow:|version| "/usr/bin/dumb-init …" 3 minutes ago Up 3 minutes (healthy) 8080/tcp compose_airflow-worker_1
ed9b09fc84b1 apache/airflow:|version| "/usr/bin/dumb-init …" 3 minutes ago Up 3 minutes (healthy) 8080/tcp compose_airflow-scheduler_1
65ac1da2c219 apache/airflow:|version| "/usr/bin/dumb-init …" 3 minutes ago Up 3 minutes (healthy) 0.0.0.0:5555->5555/tcp, 8080/tcp compose_flower_1
7cb1fb603a98 apache/airflow:|version| "/usr/bin/dumb-init …" 3 minutes ago Up 3 minutes (healthy) 0.0.0.0:8080->8080/tcp compose_airflow-webserver_1
74f3bbe506eb postgres:13 |version-spacepad| "docker-entrypoint.s…" 18 minutes ago Up 17 minutes (healthy) 5432/tcp compose_postgres_1
0bd6576d23cb redis:latest |version-spacepad| "docker-entrypoint.s…" 10 hours ago Up 17 minutes (healthy) 0.0.0.0:6379->6379/tcp compose_redis_1
After starting Airflow, you can interact with it in 3 ways;
- by running
CLI commands </usage-cli>
. - via a browser using
the web interface </ui>
. - using
the REST API </stable-rest-api-ref>
.
You can also run CLI commands </usage-cli>
, but you have to do it in one of the defined airflow-*
services. For example, to run airflow info
, run the following command:
docker-compose run airflow-worker airflow info
If you have Linux or Mac OS, you can make your work easier and download a optional wrapper scripts that will allow you to run commands with a simpler command.
quick_start_ctx
curl -LfO '{{ doc_root_url }}airflow.sh'
chmod +x airflow.sh
Now you can run commands easier.
./airflow.sh info
You can also use bash
as parameter to enter interactive bash shell in the container or python
to enter python container.
./airflow.sh bash
./airflow.sh python
Once the cluster has started up, you can log in to the web interface and try to run some tasks.
The webserver available at: http://localhost:8080
. The default account has the login airflow
and the password airflow
.
Basic username password authentication <https://tools.ietf.org/html/rfc7617 https://en.wikipedia.org/wiki/Basic_access_authentication> is currently supported for the REST API, which means you can use common tools to send requests to the API.
The webserver available at: http://localhost:8080
. The default account has the login airflow
and the password airflow
.
Here is a sample curl
command, which sends a request to retrieve a pool list:
ENDPOINT_URL="http://localhost:8080/"
curl -X GET \
--user "airflow:airflow" \
"${ENDPOINT_URL}/api/v1/pools"
To stop and delete containers, delete volumes with database data and download images, run:
docker-compose down --volumes --rmi all
The Docker Compose file uses the latest Airflow image (apache/airflow). If you need install a new Python library or system library, you can customize and extend it <docker-stack:index>
.
From this point, you can head to the /tutorial
section for further examples or the /howto/index
section if you're ready to get your hands dirty.
Do not confuse the variable names here with the build arguments set when image is built. The AIRFLOW_UID
and AIRFLOW_GID
build args default to 50000
when the image is built, so they are "baked" into the image. On the other hand, the environment variables below can be set when the container is running, using - for example - result of id -u
command, which allows to use the dynamic host runtime user id which is unknown at the time of building the image.
Variable | Description | Default |
---|---|---|
AIRFLOW_IMAGE_NAME |
Airflow Image to use. | apache/airflow: |
AIRFLOW_UID |
UID of the user to run Airflow containers as. Override if you want to use use non-default Airflow UID (for example when you map folders from host, it should be set to result of id -u call. If you change it from default 50000, you must set AIRFLOW_GID to 0 . When it is changed, a 2nd user with the UID specified is dynamically created with default name inside the container and home of the use is set to /airflow/home/ in order to share Python libraries installed there. This is in order to achieve the OpenShift compatibility. See more in the Arbitrary Docker User <arbitrary-docker-user> |
50000 |
AIRFLOW_GID |
Group ID in Airflow containers. It overrides the GID of the user. It is 50000 by default but if you want to use different UID than default it must be set to 0 . |
50000 |
Those additional variables are useful in case you are trying out/testing Airflow installation via docker compose. They are not intended to be used in production, but they make the environment faster to bootstrap for first time users with the most common customizations.
Variable | Description | Default |
---|---|---|
_AIRFLOW_WWW_USER_USERNAME |
Username for the administrator UI account. If this value is specified, admin UI user gets created automatically. This is only useful when you want to run Airflow for a test-drive and want to start a container with embedded development database. | airflow |
_AIRFLOW_WWW_USER_PASSWORD |
Password for the administrator UI account. Only used when _AIRFLOW_WWW_USER_USERNAME set. |
airflow |
_PIP_ADDITIONAL_REQUIREMENTS |
If not empty, airflow containers will attempt to install requirements specified in the variable. example: lxml==4.6.3 charset-normalizer==1.4.1 . Available in Airflow image 2.1.1 and above. |