Skip to content

Commit

Permalink
Standardize approach to dependencies (#21356)
Browse files Browse the repository at this point in the history
Approach to dependencies we had (especially with regards to
upper bounds) was pretty random so far. This PR attempts to
describe the rules discussed in the devlist discussion - including
review and update of all dependencies to match the policies.
  • Loading branch information
potiuk committed Feb 14, 2022
1 parent f08c2d5 commit 7864693
Show file tree
Hide file tree
Showing 3 changed files with 170 additions and 70 deletions.
49 changes: 49 additions & 0 deletions README.md
Expand Up @@ -53,6 +53,7 @@ Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The
- [Semantic versioning](#semantic-versioning)
- [Version Life Cycle](#version-life-cycle)
- [Support for Python and Kubernetes versions](#support-for-python-and-kubernetes-versions)
- [Approach to dependencies of Airflow](#approach-to-dependencies-of-airflow)
- [Contributing](#contributing)
- [Who uses Apache Airflow?](#who-uses-apache-airflow)
- [Who Maintains Apache Airflow?](#who-maintains-apache-airflow)
Expand Down Expand Up @@ -308,6 +309,54 @@ They are based on the official release schedule of Python and Kubernetes, nicely
* Previous versions [require](https://github.com/apache/airflow/issues/8162) at least Python 3.5.3
when using Python 3.

## Approach to dependencies of Airflow

Airflow has a lot of dependencies - direct and transitive, also Airflow is both - library and application,
therefore our policies to dependencies has to include both - stability of installation of application,
but also ability to install newer version of dependencies for those users who develop DAGs. We developed
the approach where `constraints` are used to make sure airflow can be installed in a repeatable way, while
we do not limit our users to upgrade most of the dependencies. As a result we decided not to upper-bound
version of Airflow dependencies by default, unless we have good reasons to believe upper-bounding them is
needed because of importance of the dependency as well as risk it involves to upgrade specific dependency.
We also upper-bound the dependencies that we know cause problems.

The constraint mechanism of ours takes care about finding and upgrading all the non-upper bound dependencies
automatically (providing that all the tests pass). Our `main` build failures will indicate in case there
are versions of dependencies that break our tests - indicating that we should either upper-bind them or
that we should fix our code/tests to account for the upstream changes from those dependencies.

Whenever we upper-bound such a dependency, we should always comment why we are doing it - i.e. we should have
a good reason why dependency is upper-bound. And we should also mention what is the condition to remove the
binding.

### Approach for dependencies for Airflow Core

Those `extras` and `providers` dependencies are maintained in `setup.cfg`.

There are few dependencies that we decided are important enough to upper-bound them by default, as they are
known to follow predictable versioning scheme, and we know that new versions of those are very likely to
bring breaking changes. We commit to regularly review and attempt to upgrade to the newer versions of
the dependencies as they are released, but this is manual process.

The important dependencies are:

* `SQLAlchemy`: upper-bound to specific MINOR version (SQLAlchemy is known to remove deprecations and
introduce breaking changes especially that support for different Databases varies and changes at
various speed (example: SQLAlchemy 1.4 broke MSSQL integration for Airflow)
* `Alembic`: it is important to handle our migrations in predictable and performant way. It is developed
together with SQLAlchemy. Our experience with Alembic is that it very stable in MINOR version
* `Flask`: We are using Flask as the back-bone of our web UI and API. We know major version of Flask
are very likely to introduce breaking changes across those so limiting it to MAJOR version makes sense
* `werkzeug`: the library is known to cause problems in new versions. It is tightly coupled with Flask
libraries, and we should update them together

### Approach for dependencies in Airflow Providers and extras

Those `extras` and `providers` dependencies are maintained in `setup.py`.

By default, we should not upper-bound dependencies for providers, however each provider's maintainer
might decide to add additional limits (and justify them with comment)

## Contributing

Want to help build Apache Airflow? Check out our [contributing documentation](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst).
Expand Down
56 changes: 37 additions & 19 deletions setup.cfg
Expand Up @@ -79,20 +79,30 @@ setup_requires =
# DEPENDENCIES_EPOCH_NUMBER in the Dockerfile.ci
#####################################################################################################
install_requires =
# Alembic is important to handle our migrations in predictable and performant way. It is developed
# together with SQLAlchemy. Our experience with Alembic is that it very stable in minor version
alembic>=1.5.1, <2.0
argcomplete>=1.10, <3.0
attrs>=20.0, <21.0
argcomplete>=1.10
# We limit the version of attrs to work with the old version of cattrs
attrs>=20.0,<21.0
blinker
cached_property~=1.5;python_version<="3.7"
cached_property>=1.5.0;python_version<="3.7"
# Cattrs upgrades were known to break lineage https://github.com/apache/airflow/issues/16172
# TODO: Cattrs is now at 3.8 version so we should attempt to upgrade cattrs soon.
cattrs~=1.1, !=1.7.*
colorlog>=4.0.2, <7.0
colorlog>=4.0.2
connexion[swagger-ui,flask]>=2.10.0
cron-descriptor>=1.2.24
croniter>=0.3.17
cryptography>=0.9.3
deprecated>=1.2.13
dill>=0.2.2, <0.4
dill>=0.2.2
# Flask and all related libraries are limited to below 2.0.0 because we expect it to introduce
# Serious breaking changes. Flask 2.0 has been introduced in May 2021 and 2.0.2 version is available
# now (Feb 2022): TODO: we should attempt to migrate to Flask 2 and all below flask libraries soon.
flask>=1.1.0, <2.0
# FlaskAppBuilder is very tight integration for UI but we are likely to remove it as a dependency soon
# TODO: Remove it when we are ready
flask-appbuilder~=3.4, <4.0.0
flask-caching>=1.5.0, <2.0.0
flask-login>=0.3, <0.5
Expand All @@ -101,44 +111,52 @@ install_requires =
gunicorn>=20.1.0
httpx
importlib_metadata>=1.7;python_version<"3.9"
importlib_resources~=5.2;python_version<"3.9"
# Logging is broken with itsdangerous > 2
importlib_resources>=5.2;python_version<"3.9"
# Logging is broken with itsdangerous > 2 - likely due to changed serializing support
# https://itsdangerous.palletsprojects.com/en/2.0.x/changes/#version-2-0-0
# itsdangerous 2 has been released in May 2020
# TODO: we should attempt to upgrade to line 2 of itsdangerous
itsdangerous>=1.1.0, <2.0
# Jinja2 3.1 will remove the 'autoescape' and 'with' extensions, which would
# break Flask 1.x, so we limit this for future compatibility. Remove this
# when bumping Flask to >=2.
jinja2>=2.10.1,<3.1
# We are using JSONSchema 3 for unknown reason
# TODO: we should attempt to remove the upper binding of JSONSchema
jsonschema~=3.0
lazy-object-proxy
lockfile>=0.12.2
markdown~=3.0
markdown>=3.0
markupsafe>=1.1.1
marshmallow-oneofschema>=2.0.1
packaging>=14.0
pendulum~=2.0
pep562~=1.0;python_version<"3.7"
pluggy~=1.0
psutil>=4.2.0, <6.0.0
pygments>=2.0.1, <3.0
pendulum>=2.0
pep562>=1.0;python_version<"3.7"
pluggy>=1.0
psutil>=4.2.0
pygments>=2.0.1
# python daemon crashes with 'socket operation on non-socket' for python 3.8+ in version < 2.2.4
# https://pagure.io/python-daemon/issue/34
python-daemon>=2.2.4
python-dateutil>=2.3, <3
python-nvd3~=0.15.0
python-slugify~=5.0
python-dateutil>=2.3
python-nvd3>=0.15.0
python-slugify>=5.0
rich>=9.2.0
setproctitle>=1.1.8, <2
setproctitle>=1.1.8
# SQL Alchemy 1.4.10 introduces a bug where for PyODBC driver UTCDateTime fields get wrongly converted
# as string and fail to be converted back to datetime. It was supposed to be fixed in
# https://github.com/sqlalchemy/sqlalchemy/issues/6366 (released in 1.4.12) but apparently our case
# is different. Opened https://github.com/sqlalchemy/sqlalchemy/issues/7660 to track it
sqlalchemy>=1.3.18,<1.4.10
sqlalchemy_jsonfield~=1.0
tabulate>=0.7.5, <0.9
sqlalchemy_jsonfield>=1.0
tabulate>=0.7.5
tenacity>=6.2.0
termcolor>=1.1.0
typing-extensions>=3.7.4;python_version<"3.8"
unicodecsv>=0.14.1
# Werkzeug is known to cause breaking changes and it is very closely tied with FlaskAppBuilder and other
# Flask dependencies and the limit to 1.* line should be reviewed when we upgrade Flask and remove
# FlaskAppBuilder.
werkzeug~=1.0, >=1.0.1

[options.packages.find]
Expand Down

0 comments on commit 7864693

Please sign in to comment.