Skip to content

Latest commit

 

History

History
821 lines (572 loc) · 27.1 KB

README_RELEASE_PROVIDER_PACKAGES.md

File metadata and controls

821 lines (572 loc) · 27.1 KB

Table of contents


Provider packages

The prerequisites to release Apache Airflow are described in README.md.

You can read more about the command line tools used to generate the packages in the Provider packages.

Decide when to release

You can release provider packages separately from the main Airflow on an ad-hoc basis, whenever we find that a given provider needs to be released - due to new features or due to bug fixes. You can release each provider package separately, but due to voting and release overhead we try to group releases of provider packages together.

Provider packages versioning

We are using the SEMVER versioning scheme for the provider packages. This is in order to give the users confidence about maintaining backwards compatibility in the new releases of those packages.

Details about maintaining the SEMVER version are going to be discussed and implemented in the related issue

Prepare Regular Provider packages (RC)

Generate release notes

Prepare release notes for all the packages you plan to release. When the provider package version has not been updated since the latest version, the release notes are not generated. Release notes are only generated, when the latest version of the package does not yet have a corresponding TAG. The tags for providers is of the form providers-<PROVIDER_ID>/<VERSION> for example providers-amazon/1.0.0. During releasing, the RC1/RC2 tags are created (for example providers-amazon/1.0.0rc1).

Details about maintaining the SEMVER version are going to be discussed and implemented in the related issue

./breeze prepare-provider-documentation [packages]

This command will not only prepare documentation but will also help the release manager to review changes implemented in all providers, and determine which of the providers should be released. For each provider details will be printed on what changes were implemented since the last release including links to particular commits. This should help to determine which version of provider should be released:

  • increased patch-level for bugfix-only change
  • increased minor version if new features are added
  • increased major version if breaking changes are added

It also helps the release manager to update CHANGELOG.rst where high-level overview of the changes should be documented for the providers released.

You should iterate and re-generate the same content after any change as many times as you want. The generated files should be added and committed to the repository.

When you want to regenerate the changes before the release and make sure all changelogs are updated, run it in non-interactive mode:

./breeze --non-interactive prepare-provider-documentation [packages]

When you run the command and documentation generation is successful you will get a command that you can run to create GitHub issue where you will be tracking status of tests for the providers you release.

You can also trigger automated execution of the issue by running:

./breeze --non-interactive --generate-providers-issue prepare-provider-documentation [packages]

Once you release packages, you should create the issue with the content specified and link to it in the email sent to the devlist.

Build provider packages for SVN apache upload

Those packages might get promoted to "final" packages by just renaming the files, so internally they should keep the final version number without the rc suffix, even if they are rc1/rc2/... candidates.

They also need to be signed and have checksum files. You can generate the checksum/signature files by running the "dev/sign.sh" script (assuming you have the right PGP key set-up for signing). The script generates corresponding .asc and .sha512 files for each file to sign.

Build and sign the source and convenience packages

  • Cleanup dist folder:
export AIRFLOW_REPO_ROOT=$(pwd)
rm -rf ${AIRFLOW_REPO_ROOT}/dist/*
  • Release candidate packages:
./breeze prepare-provider-packages --package-format both

if you ony build few packages, run:

./breeze prepare-provider-packages --package-format both PACKAGE PACKAGE ....
  • Sign all your packages
pushd dist
../dev/sign.sh *
popd

Commit the source packages to Apache SVN repo

  • Push the artifacts to ASF dev dist repo
# First clone the repo if you do not have it
svn checkout https://dist.apache.org/repos/dist/dev/airflow airflow-dev

# update the repo in case you have it already
cd airflow-dev
svn update

# Create a new folder for the release.
cd providers

# Remove previously released providers
rm -rf *

# Move the artifacts to svn folder
mv ${AIRFLOW_REPO_ROOT}/dist/* .

# Add and commit
svn add *
svn commit -m "Add artifacts for Airflow Providers $(date "+%Y-%m-%d%n")"

cd ${AIRFLOW_REPO_ROOT}

Verify that the files are available at providers

Publish the Regular convenience package to PyPI

In case of pre-release versions you build the same packages for both PyPI and SVN so you can simply use packages generated in the previous step, and you can skip the "prepare" step below.

In order to publish release candidate to PyPI you just need to build and release packages. The packages should however contain the rcN suffix in the version file name but not internally in the package, so you need to use --version-suffix-for-pypi switch to prepare those packages. Note that these are different packages than the ones used for SVN upload though they should be generated from the same sources.

  • Generate the packages with the right RC version (specify the version suffix with PyPI switch). Note that this will clean up dist folder before generating the packages, so you will only have the right packages there.
rm -rf ${AIRFLOW_REPO_ROOT}/dist/*

./breeze prepare-provider-packages --version-suffix-for-pypi rc1 --package-format both

if you ony build few packages, run:

./breeze prepare-provider-packages --version-suffix-for-pypi rc1 --package-format both \
    PACKAGE PACKAGE ....
  • Verify the artifacts that would be uploaded:
twine check ${AIRFLOW_REPO_ROOT}/dist/*
  • Upload the package to PyPi's test environment:
twine upload -r pypitest ${AIRFLOW_REPO_ROOT}/dist/*
  • Verify that the test packages look good by downloading it and installing them into a virtual environment. Twine prints the package links as output - separately for each package.

  • Upload the package to PyPi's production environment:

twine upload -r pypi ${AIRFLOW_REPO_ROOT}/dist/*
  • Again, confirm that the packages are available under the links printed.

Add tags in git

Assume that your remote for apache repository is called apache you should now set tags for the providers in the repo.

./dev/provider_packages/tag_providers.sh

Prepare documentation

Documentation is an essential part of the product and should be made available to users. In our cases, documentation for the released versions is published in a separate repository - apache/airflow-site, but the documentation source code and build tools are available in the apache/airflow repository, so you have to coordinate between the two repositories to be able to build the documentation.

Documentation for providers can be found in the /docs/apache-airflow-providers directory and the /docs/apache-airflow-providers-*/ directory. The first directory contains the package contents lists and should be updated every time a new version of provider packages is released.

  • First, copy the airflow-site repository and set the environment variable AIRFLOW_SITE_DIRECTORY.
git clone https://github.com/apache/airflow-site.git airflow-site
cd airflow-site
export AIRFLOW_SITE_DIRECTORY="$(pwd)"
  • Then you can go to the directory and build the necessary documentation packages
cd "${AIRFLOW_REPO_ROOT}"
./breeze build-docs -- \
  --for-production \
  --package-filter apache-airflow-providers \
  --package-filter 'apache-airflow-providers-*'

Usually when we release packages we also build documentation for the "documentation-only" packages. This means that unless we release just few selected packages or if we need to deliberately skip some packages we should release documentation for all provider packages and the above command is the one to use.

If we want to just release some providers you can release them in this way:

cd "${AIRFLOW_REPO_ROOT}"
./breeze build-docs -- \
  --for-production \
  --package-filter apache-airflow-providers \
  --package-filter 'apache-airflow-providers-PACKAGE1' \
  --package-filter 'apache-airflow-providers-PACKAGE2' \
  ...

If you have providers as list of provider ids becuse you just released them you can build them with

./dev/provider_packages/build_provider_documentation.sh amazon apache.beam google ....
  • Now you can preview the documentation.
./docs/start_doc_server.sh
  • Copy the documentation to the airflow-site repository

NOTE In order to run the publish documentation you need to activate virtualenv where you installed apache-airflow with doc extra:

  • pip install apache-airflow[doc]

All providers (including overriding documentation for doc-only changes):

./docs/publish_docs.py \
    --package-filter apache-airflow-providers \
    --package-filter 'apache-airflow-providers-*' \
    --override-versioned

cd "${AIRFLOW_SITE_DIRECTORY}"

If you have providers as list of provider ids because you just released them you can build them with

./dev/provider_packages/publish_provider_documentation.sh amazon apache.beam google ....
  • If you publish a new package, you must add it to the docs index:

  • Create the commit and push changes.

branch="add-documentation-$(date "+%Y-%m-%d%n")"
git checkout -b "${branch}"
git add .
git commit -m "Add documentation for packages - $(date "+%Y-%m-%d%n")"
git push --set-upstream origin "${branch}"

Prepare issue in GitHub to keep status of testing

Create GitHub issue with the content generated via prepare-provider-documentation or manual execution of the script above. You will use link to that issue in the next step.

Prepare voting email for Providers release candidate

Make sure the packages are in https://dist.apache.org/repos/dist/dev/airflow/providers/

Send out a vote to the dev@airflow.apache.org mailing list. Here you can prepare text of the email.

subject:

cat <<EOF
[VOTE] Airflow Providers prepared on $(date "+%B %d, %Y")
EOF
cat <<EOF
Hey all,

I have just cut the new wave Airflow Providers packages. This email is calling a vote on the release,
which will last for 72 hours - which means that it will end on $(date -d '+3 days').

Consider this my (binding) +1.

<ADD ANY HIGH-LEVEL DESCRIPTION OF THE CHANGES HERE!>

Airflow Providers are available at:
https://dist.apache.org/repos/dist/dev/airflow/providers/

*apache-airflow-providers-<PROVIDER>-*-bin.tar.gz* are the binary
 Python "sdist" release - they are also official "sources" for the provider packages.

*apache_airflow_providers_<PROVIDER>-*.whl are the binary
 Python "wheel" release.

The test procedure for PMC members who would like to test the RC candidates are described in
https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-the-release-by-pmc-members

and for Contributors:

https://github.com/apache/airflow/blob/main/dev/README_RELEASE_PROVIDER_PACKAGES.md#verify-by-contributors


Public keys are available at:
https://dist.apache.org/repos/dist/release/airflow/KEYS

Please vote accordingly:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove with the reason


Only votes from PMC members are binding, but members of the community are
encouraged to test the release and vote with "(non-binding)".

Please note that the version number excludes the 'rcX' string.
This will allow us to rename the artifact without modifying
the artifact checksums when we actually release.

The status of testing the providers by the community is kept here:
<TODO COPY LINK TO THE ISSUE CREATED>

You can find packages as well as detailed changelog following the below links:

<PASTE TWINE UPLOAD LINKS HERE. SORT THEM BEFORE!>

Cheers,
<TODO: Your Name>

EOF

Due to the nature of packages, not all packages have to be released as convenience packages in the final release. During the voting process the voting PMCs might decide to exclude certain packages from the release if some critical problems have been found in some packages.

Please modify the message above accordingly to clearly exclude those packages.

Verify the release by PMC members

SVN check

The files should be present in Airflow dist

The following files should be present (9 files):

  • -source.tar.gz + .asc + .sha512 (one set of files)
  • -bin-tar.gz + .asc + .sha512 (one set of files per provider)
  • -.whl + .asc + .sha512 (one set of files per provider)

As a PMC you should be able to clone the SVN repository:

svn co https://dist.apache.org/repos/dist/dev/airflow/

Or update it if you already checked it out:

svn update .

Optionally you can use check.files.py script to verify that all expected files are present in SVN. This script may help also with verifying installation of the packages.

python check_files.py -v {VERSION} -t providers -p {PATH_TO_SVN}

Licences check

This can be done with the Apache RAT tool.

  • Download the latest jar from https://creadur.apache.org/rat/download_rat.cgi (unpack the binary, the jar is inside)
  • Unpack the binary (-bin.tar.gz) to a folder
  • Enter the folder and run the check (point to the place where you extracted the .jar)
java -jar ../../apache-rat-0.13/apache-rat-0.13.jar -E .rat-excludes -d .

where .rat-excludes is the file in the root of Airflow source code.

Signature check

Make sure you have the key of person signed imported in your GPG. You can find the valid keys in KEYS.

You can import the whole KEYS file:

gpg --import KEYS

You can also import the keys individually from a keyserver. The below one uses Kaxil's key and retrieves it from the default GPG keyserver OpenPGP.org:

gpg --receive-keys 12717556040EEF2EEAF1B9C275FCCD0A25FA0E4B

You should choose to import the key when asked.

Note that by being default, the OpenPGP server tends to be overloaded often and might respond with errors or timeouts. Many of the release managers also uploaded their keys to the GNUPG.net keyserver, and you can retrieve it from there.

gpg --keyserver keys.gnupg.net --receive-keys 12717556040EEF2EEAF1B9C275FCCD0A25FA0E4B

Once you have the keys, the signatures can be verified by running this:

for i in *.asc
do
   echo "Checking $i"; gpg --verify $i
done

This should produce results similar to the below. The "Good signature from ..." is indication that the signatures are correct. Do not worry about the "not certified with a trusted signature" warning. Most of the certificates used by release managers are self signed, that's why you get this warning. By importing the server in the previous step and importing it via ID from KEYS page, you know that this is a valid Key already.

Checking apache-airflow-2.0.2rc4-bin.tar.gz.asc
gpg: assuming signed data in 'apache-airflow-2.0.2rc4-bin.tar.gz'
gpg: Signature made sob, 22 sie 2020, 20:28:28 CEST
gpg:                using RSA key 12717556040EEF2EEAF1B9C275FCCD0A25FA0E4B
gpg: Good signature from "Kaxil Naik <kaxilnaik@gmail.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1271 7556 040E EF2E EAF1  B9C2 75FC CD0A 25FA 0E4B
Checking apache_airflow-2.0.2rc4-py2.py3-none-any.whl.asc
gpg: assuming signed data in 'apache_airflow-2.0.2rc4-py2.py3-none-any.whl'
gpg: Signature made sob, 22 sie 2020, 20:28:31 CEST
gpg:                using RSA key 12717556040EEF2EEAF1B9C275FCCD0A25FA0E4B
gpg: Good signature from "Kaxil Naik <kaxilnaik@gmail.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1271 7556 040E EF2E EAF1  B9C2 75FC CD0A 25FA 0E4B
Checking apache-airflow-2.0.2rc4-source.tar.gz.asc
gpg: assuming signed data in 'apache-airflow-2.0.2rc4-source.tar.gz'
gpg: Signature made sob, 22 sie 2020, 20:28:25 CEST
gpg:                using RSA key 12717556040EEF2EEAF1B9C275FCCD0A25FA0E4B
gpg: Good signature from "Kaxil Naik <kaxilnaik@gmail.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1271 7556 040E EF2E EAF1  B9C2 75FC CD0A 25FA 0E4B

SHA512 check

Run this:

for i in *.sha512
do
    echo "Checking $i"; shasum -a 512 `basename $i .sha512 ` | diff - $i
done

You should get output similar to:

Checking apache-airflow-providers-google-1.0.0rc1-bin.tar.gz.sha512
Checking apache_airflow-providers-google-1.0.0rc1-py3-none-any.whl.sha512

Verify by Contributors

This can be done (and we encourage to) by any of the Contributors. In fact, it's best if the actual users of Apache Airflow test it in their own staging/test installations. Each release candidate is available on PyPI apart from SVN packages, so everyone should be able to install the release candidate version.

You can use any of the installation methods you prefer (you can even install it via the binary wheels downloaded from the SVN).

Installing in your local virtualenv

You have to make sure you have Airflow 2* installed in your PIP virtualenv (the version you want to install providers with).

pip install apache-airflow-providers-<provider>==<VERSION>rc<X>

Installing with Breeze

There is also an easy way of installation with Breeze if you have the latest sources of Apache Airflow. Here is a typical scenario.

First copy all the provider packages .whl files to the dist folder.

./breeze start-airflow --use-airflow-version <VERSION>rc<X> \
    --python 3.7 --backend postgres --use-packages-from-dist

Building your own docker image

If you prefer to build your own image, you can also use the official image and PyPI packages to test provider packages. This is especially helpful when you want to test integrations, but you need to install additional tools. Below is an example Dockerfile, which installs providers for Google/

FROM apache/airflow:2.0.0

RUN pip install --upgrade --user apache-airflow-providers-google==2.0.0.rc1

USER ${AIRFLOW_UID}

To build an image build and run a shell, run:

docker build . --tag my-image:0.0.1
docker run  -ti \
    --rm \
    -v "$PWD/data:/opt/airflow/" \
    -v "$PWD/keys/:/keys/" \
    -p 8080:8080 \
    -e AIRFLOW__CORE__LOAD_EXAMPLES=True \
    my-image:0.0.1 bash

Additional Verification

Once you install and run Airflow, you can perform any verification you see as necessary to check that the Airflow works as you expected.

Publish release

Summarize the voting for the Apache Airflow release

Once the vote has been passed, you will need to send a result vote to dev@airflow.apache.org:

Subject:

[RESULT][VOTE] Airflow  Providers - release of DATE OF RELEASE

Message:

Hello,

Apache Airflow Providers (based on RC1) have been accepted.

3 “+1” binding votes received:
- Jarek Potiuk  (binding)
- Kaxil Naik (binding)
- Tomasz Urbaszek (binding)


Vote thread:
https://lists.apache.org/thread.html/736404ca3d2b2143b296d0910630b9bd0f8b56a0c54e3a05f4c8b5fe@%3Cdev.airflow.apache.org%3E

I'll continue with the release process, and the release announcement will follow shortly.

Cheers,
<your name>

Publish release to SVN

The best way of doing this is to svn cp between the two repos (this avoids having to upload the binaries again, and gives a clearer history in the svn commit logs.

We also need to archive older releases before copying the new ones Release policy

# Go to the directory where you have checked out the dev svn release
# And go to the sub-folder with RC candidates
cd "<ROOT_OF_YOUR_AIRFLOW_REPO>"
# Set AIRFLOW_REPO_ROOT to the path of your git repo
export AIRFLOW_REPO_ROOT=$(pwd)

cd "<ROOT_OF_YOUR_DEV_REPO>/providers/"
export SOURCE_DIR=$(pwd)

# If some packages have been excluded, remove them now
# Check the packages
ls *<provider>*
# Remove them
svn rm *<provider>*

# Go the folder where you have checked out the release repo
# Clone it if it's not done yet
svn checkout https://dist.apache.org/repos/dist/release/airflow airflow-release

# Update to latest version
svn update

# Create providers folder if it does not exist
# All latest releases are kept in this one folder without version sub-folder
mkdir -pv providers
cd providers

# Copy your providers with the target name to dist directory and to SVN
rm ${AIRFLOW_REPO_ROOT}/dist/*

for file in ${SOURCE_DIR}/*
do
 base_file=$(basename ${file})
 cp -v "${file}" "${AIRFLOW_REPO_ROOT}/dist/${base_file//rc[0-9]/}"
 svn mv "${file}" "${base_file//rc[0-9]/}"
done

# Check which old packages will be removed (you need python 3.6+)
python ${AIRFLOW_REPO_ROOT}/dev/provider_packages/remove_old_releases.py \
    --directory .

# Remove those packages
python ${AIRFLOW_REPO_ROOT}/dev/provider_packages/remove_old_releases.py \
    --directory . --execute


# Commit to SVN
svn commit -m "Release Airflow Providers on $(date)"

Verify that the packages appear in providers

Publish the Regular convenience package to PyPI

By that time the packages with proper name (renamed from rc* to final version should be in your dist folder.

cd ${AIRFLOW_REPO_ROOT}
  • Verify the artifacts that would be uploaded:
twine check ${AIRFLOW_REPO_ROOT}/dist/*.whl ${AIRFLOW_REPO_ROOT}/dist/*.tar.gz
  • Upload the package to PyPi's test environment:
twine upload -r pypitest ${AIRFLOW_REPO_ROOT}/dist/*.whl ${AIRFLOW_REPO_ROOT}/dist/*.tar.gz
  • Verify that the test packages look good by downloading it and installing them into a virtual environment. Twine prints the package links as output - separately for each package.

  • Upload the package to PyPi's production environment:

twine upload -r pypi ${AIRFLOW_REPO_ROOT}/dist/*.whl ${AIRFLOW_REPO_ROOT}/dist/*.tar.gz
  • Again, confirm that the packages are available under the links printed.

Publish documentation prepared before

Merge the PR that you prepared before with the documentation.

Add tags in git

Assume that your remote for apache repository is called apache you should now set tags for the providers in the repo.

./dev/provider_packages/tag_providers.sh

Notify developers of release

Subject:

cat <<EOF
Airflow Providers released on $(date) are ready
EOF

Body:

cat <<EOF
Dear Airflow community,

I'm happy to announce that new versions of Airflow Providers packages were just released.

The source release, as well as the binary releases, are available here:

https://dist.apache.org/repos/dist/release/airflow/providers/

We also made those versions available on PyPi for convenience ('pip install apache-airflow-providers-*'):

https://pypi.org/search/?q=apache-airflow-providers

The documentation is available at https://airflow.apache.org/docs/ and linked from the PyPI packages:

<PASTE TWINE UPLOAD LINKS HERE. SORT THEM BEFORE!>

Cheers,
<your name>
EOF