Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official helm chart released by the Airlfow Community #211

Closed
potiuk opened this issue May 21, 2021 · 57 comments
Closed

Official helm chart released by the Airlfow Community #211

potiuk opened this issue May 21, 2021 · 57 comments
Labels
kind/question kind - user questions

Comments

@potiuk
Copy link

potiuk commented May 21, 2021

Hello @thesuperzapper

The Airlfow community have finally released the "official" helm chart. Documentation here: http://airflow.apache.org/docs/helm-chart/stable/index.html

Release message/thread here: https://lists.apache.org/thread.html/r131d839158b8a7a92a7813183cae30d248be9e330ea2faaf9e654970%40%3Cdev.airflow.apache.org%3E

We've been discussing before about community providing support for the helm chart, and the community decided to continue with the helm chart donated by Astronomer. This is finally now fully available, released and will be maintained and supported by the community.

I wonder if you would like to continue having the separate chart and maintain it, or maybe you choose the path of deprecating it and helping the users to transition to the the "community managed" official chart. Having two charts in the

I think we are open to help with the transition if you choose that route and possibly even help with developing a transition documentation, how to switch etc. if you choose that route.

Let us know what you think.

@potiuk potiuk added the kind/enhancement kind - new features or changes label May 21, 2021
@andormarkus
Copy link

Hi @potiuk

For me "official" helm chart lacks several features which makes transition not feasible for me. I really like this chart because it offers turn key solution does not require addition configuration steps like the official one.

Currently I miss the following features from the official chart:

  • subPath is not supported on dag / log persistence
  • extraPipPackages
  • post-install/post-update helm hook jobs:
    - create_connections.py
    - create_pools.py
    - create_users.py
    - create_variables.py

Adding subPath to the official chart will not be an issue. However, adding the post-install/post-update helm hook jobs / extraPipPackages to the official chart I don't know will airflow community approve it.

extraPipPackages is important for us because we are using official airflow docker image. If we want to add extra pip package then we need to run custom docker image, which we want to avoid.

@kaxil
Copy link

kaxil commented May 21, 2021

Thanks, @andormarkus for your comment, we definitely won't support extraPipPackages or the jobs that create Connections, pool, users or variables.

I understand why you need it but this is not the right approach. Changing resources like that is not correct in my opinion.

For installing the pip package -- you should do it when building the docker file itself -- not on the fly when installing using chart. Once built, the Docker Image should be immutable. This is very important for Production systems when running critical workloads.
Connections and Variables should either be handled via Environment Variables or other Secrets Backend --- again for the Production system you do not want to expose secrets in plain text in your values.yaml or the yaml file that you use for overriding values.

For Pools (and even Connections & Vars): You should use the API: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/post_pool

And lastly, for Users Management, I would generally recommend using LDAP or OAUTH or other auth-backends.

@potiuk
Copy link
Author

potiuk commented May 21, 2021

Yeah. My opinion is very close to @kaxilss - the "extraPipPackages" and post install hooks are not really following the way how productionized installation should be done. In my opinion (and that's an opinion only) they introduce more problems than they solve, because you add unnecessary variability and delays where it is easily done by just building a custom image, which is both - easy, straightforward and recommended. The official image by Airflow is really a "reference image" and there are several ways you can take to have your own custom image - either by customizing the build process, or extending the image the usual way all images are extended (with FROM: command).

As @kaxil mentioned - for Airflow 2.0+ we have now APIs and even official python client that can be used more easily (and if there are any more features needed, we are happy to discuss about adding them).

Again - this is my opinion, not a source of truth. I think it's up to @thesuperzapper and other maintainers to decide what to do - if they want to continue maintaining the helm chart and support all the potential issues, or incompatibilities coming from future releases of Airflow - this is fine. Apache 2.0 license perfectly allows that, as long as it clear it is not a community-managed project (in the past people were confused, whether the chart is a community managed project or not). And if not, then we could help with providing transition paths. Happy to do so.

Just as a comment - one thing that we might have to take a look at is use of the Airflow Logo and Apache name in the project though (this is a PMC responsibility to take care about it) so that it is not easily confused with Apache-managed project (https://www.apache.org/foundation/marks/#guidelines). I have no idea how whether it is a problem for the ASF to have similarly named charts with Airflow logo and Apache name in both in the way they can be easily confused (I made the mistake myself actually).

I think we might ask for a legal advice on ASF on that whether this is any problem for the foundation at all (I believe ASF is very cautious about ASF brand and it's important that the brand is used properly) - but before we go there and ask, I'd love to hear what the long term planning is for the helm chart from the maintainers - so that we know whether there is anything to ask for.

@andormarkus
Copy link

Hi @kaxil @potiuk

We are deploying everything with Terraform (AWS, Kubernetes, Helm) every helm chart is deployed with helm_release.

API:
There is no Airflow Terraform provider.... Using official python client / bash script would require us to use null_resource in Terraform which we want to avoid as much as possible. I don't think the community ever build provider for Airflow.

Pools:
I only interested to resize the default_pool to 10.000. I couldn't find environment variable for it.
Do you see on option to expose size ofdefault_pool as environment variable?

Connections:
We are using SystemsManagerParameterStoreBackend. create_connections.py not relevant for us

Users:
We are using OAUTH. create_users.py not relevant for us.

Variables:
We are using environment variables create_variables.py not relevant for us.

@kaxil
Copy link

kaxil commented May 21, 2021

@andormarkus Aah .. yes you can override Default Pool slots via Environment Variable, documentation is missing on that page -- I will add that right now.

Use AIRFLOW__CORE__NON_POOLED_TASK_COUNT=10000 -- this should work

Or update configs via [core] non_pooled_task_slot_count = 10000

PR: apache/airflow#15997


Good Point regarding "Rest API and Terraform Resources" -- From a quick search looks like Rest API Terraform provider from MasterCard is relatively popular. Example Usage: https://github.com/Mastercard/terraform-provider-restapi/blob/master/examples/dummy_users_with_fakeserver.tf

Or this one: https://registry.terraform.io/providers/fmontezuma/restapi/latest/docs

Will that work for your use case?

@potiuk
Copy link
Author

potiuk commented May 22, 2021

Hi @kaxil @potiuk

We are deploying everything with Terraform (AWS, Kubernetes, Helm) every helm chart is deployed with helm_release.

API:
There is no Airflow Terraform provider.... Using official python client / bash script would require us to use null_resource in Terraform which we want to avoid as much as possible. I don't think the community ever build provider for Airflow.

Actually there is a terraform provider that one of the PMC members of Airlfow wrote, using the new REST API: https://github.com/houqp/terraform-provider-airflow

We might consider making it also part of "community managed" resources. Maybe you can check it out?

@potiuk
Copy link
Author

potiuk commented May 28, 2021

Hey @thesuperzapper - WDYT ? Do you plan to encourage some kind of transitio (and can we help) or do you want to support your helm chart in the future? We are preparing a talk for Airflow summit about the "state of Airflow with K8S" so I think it would be great to know what we can tell the Airflow users.

For me both scenarios are OK - having community-managed helm chart and this one, as well as helping people with transition to the community one, but I think it would be great to agree on the direction.

@thesuperzapper
Copy link
Member

@potiuk (also cc @gsemet)

I am happy to maintain this chart for as long as it is still used widely, which will probably be for a while yet as:

  1. many companies are using the chart (or forks) in longstanding production deployments
  2. there are some features which are not present in the "official" chart yet:
    1. a highly-structured values.yaml file
    2. a (somewhat) comprehensive set of docs
    3. lots of values validation (to detect common configuration issue we have seen over the years)
    4. support for most past versions of airflow (including 1.10.X)
    5. post-install jobs, for things like connections, variables, pools, users, etc
    6. a feature like extraPipPackages (which is great for testing, and honestly used by many in production)
    7. support for git-sync with HTTP auth (I may be wrong on this one)
  3. I expect the "official" chart will go through a few significant changes over the next few releases to make it more clean and easy to use, for example:
    1. re-structuring its values.yaml (and setting up a process to define which values should go where)
    2. improved docs
    3. culling out some of the more environment-specific features (possibly things like Elasticsearch integration)

Personally, I would love there to only be 1 chart in the future (if this possible given the different opinions), but we are probably a wee-way from that yet (see above comments). However, once the charts become equal (in features, and usability), I will happily start pushing users across, and I expect that would be a natural process anyway.

Note, I'm open to contributing to the "official" chart to help move this process along, I'm just not sure how best to start.

@potiuk
Copy link
Author

potiuk commented May 28, 2021

Note, I'm open to contributing to the "official" chart to help move this process along, I'm just not sure how best to start.

Perfect. That's cool. I think we can close the ticket then and move to the 'collaboration' mode. I think we are more than happy to welcome the PRs to the chart - as long as those are small fixes - it's just a PR to open, if this is something substantial - just write a message to devlist, or open a discussion (if this is something that might require brainstorming) or create an issue in the Airflow Repository. This is the best way to start.

From the quick look at the differences you pointed out:

  • restructuring of 'values.yaml' probably requires some discussion - whether to do it similarly or differently, what direction to take, how to make backwards compatibility work etc.

  • documentation updates are usually just PRs (over last few days more and more PRs for helm-chart documentation have been merged and multiple people are adding more and more comprehensive docs so I guess this one will be improved rather quickly). Any help there is more than welcome.

  • values validation - I think is super-helpful and direct PRs there are more than welcome.

  • support for past versions of Airflow is not something we want. With Airflow 1.10 End-of-life in June, we encourage everyone to move to 2+. Keeping your chart up and running for a while for people who will decide to stick with it for a while longer might be the reason why the helm chart of yours can stay for a while.

  • post-install-jobs are something up for discussion, I believe this one might be done in various ways so discussion for that - either in GitHub Actions or in devlist are probably the best way to start

  • extraPipPackages is something controversial. My goal is to make it super easy to make your own images build/extended from the official airflow image and possibly even semi-automate it in the near future. But I m definitely going to start a discussion about that soon, so happy if you can join it.

  • git-sync is being improved a lot in some of the recent PRs. I think it's going to be super-comprehensive very soon.

All of this has to go through the usual way we do it in Airflow community - propose, discuss, get to consensus, if consensus is not reachable and change is significant - voting. The absolute normal process for an Apache project. And it would be great if you actively participate in it - knowing that we - as community - might reach different conclusions and directions after deliberate discussion.

I will also reach out to ASF trade to check if there are any issues with naming of the charts - I do not think so, but I want to double-check for sure if there are any guidances. I will post a JIRA ticket about it here when I open it.

@gsemet
Copy link
Collaborator

gsemet commented May 28, 2021

Thanks for keeping me in the loop. I pretty much agree with everything that has been said. (don't know if I have still a word to say, as I do not maintain the project anymore factually).

The end goal is to transition everybody to the official chart, once it has all the features required. Sure it is not optimal to maintain two charts, ideally, the official chart would get all the missing features from this one and the choice will be natural for everybody.

The differences between the two of them might be quite tricky to document, maybe we can start a new markdown document (MIGRATION.md?) in here that will evolve over time with the status of this chart and the status of the official chart. It would do:

  • compare the features of the two charts (in a table)
  • and maybe how to transition from airflow-helm/charts to helm-chart/stable for existing installations

I also think we can put a little emphasis in our README to document a bit more clearly the position of this project:

  • this is a legacy chart with some history and has been used in production by many people
  • if we can show some statistic of usage it would help newcomers understand why we cannot drop the support as of now
  • volunteers are still maintaining it (thanks a lot @thesuperzapper for taking over the job!)
  • the natural path will be to migrate to the official chart, and thus we can point to the comparison/migration document.
  • newcomers should use the official chart, and if really want some key features absent from the official chart, use this one and be prepared to migrate once desired feature is ready in the official chartt

But we may need help from the airflow community for this documentation effort.

@potiuk
Copy link
Author

potiuk commented May 28, 2021

The differences between the two of them might be quite tricky to document, maybe we can start a new markdown document (MIGRATION.md?) in here that will evolve over time with the status of this chart and the status of the official chart. It would do:

Happy to help with that.

@potiuk
Copy link
Author

potiuk commented May 28, 2021

Hey @thesuperzapper . I looked at th description of ASF policies and I think (without reaching out to the trademarks team) that you should change the logo/description to include "Powered By Apache Airflow".

It is nicely described here: https://www.apache.org/foundation/marks/contact#products

USES IN SOFTWARE PRODUCTS

Since the primary goods of any Apache project are software products provided free of charge to the general public, 
we generally do not allow third parties to use Apache brands in their software product branding.

You may be eligible to use our Powered By branding style in your software product naming,
which does not require permissions as long as you follow all aspects of the policy, in particular
always using the "Apache Projectname" form.

Separately, we offer a set of Powered By Apache logos that you may use to show
support for or use of various Apache software projects.

And the https://www.apache.org/foundation/press/kit/ contains information and tools on how to create "Powered By" logo using the original project's logo. This is to avoid the confusion which project is the "Apache" one and which one is just "Powered by Apache".

Are you ok with changing this accordingly? Or should we reach-out to trademarks@a.o to clarify ? In the latter case I will need your e-mail (you can contact me via my email on my Github Page in such case).

@thesuperzapper
Copy link
Member

@potiuk I have replaced the Airflow logo with the "powered by airflow logo", and changed the description to the following (which should remove any possibility for confusion):

the community-maintained descendant of the stable/airflow helm chart

@potiuk
Copy link
Author

potiuk commented May 28, 2021

Thanks a lot @thesuperzapper ! Would it also be possible to change the logo here: https://artifacthub.io/packages/helm/airflow-helm/airflow

@potiuk
Copy link
Author

potiuk commented May 28, 2021

BTW. I think the "Powered by " logo would look nicer with just Airflow logo :)

For example:

powered_by_airflow

@kaxil
Copy link

kaxil commented May 28, 2021

Thanks, @thesuperzapper and @gsemet for your comments. Firstly, I appreciate all the efforts in maintaining this chart for several years and helped the community -- so just wanted to says Thanks on behalf of the community and Airflow PMC members.

Secondly, just want to add some notes to the comments regarding feature-parity:

  1. there are some features which are not present in the "official" chart yet:

    1. a (somewhat) comprehensive set of docs

The Docs for the stable version of the official Helm Chart are at: https://airflow.apache.org/docs/helm-chart/
with a separate page for references of values.yaml with examples.

It will keep on improving and will love any and all feedback on what we can include over there.

  1. lots of values validation (to detect common configuration issue we have seen over the years)

ACK, any help there would be 100% appreciated. You and team have done a great job at that.

  1. support for most past versions of airflow (including 1.10.X)

The official Helm Chart does support old versions of 1.10.x too -- please do let me know if you find any issues while running any 1.10.x version

  1. post-install jobs, for things like connections, variables, pools, users, etc

Pools -- maybe, needs discussions. But for all the others we don't need to have post-install jobs, since they can be done via Secretes Backend including Environment Variables.

  1. a feature like extraPipPackages (which is great for testing, and honestly used by many in production)

This is definitely a NO-go for the official Chart. All the dependencies installation (system + Python) should be done in the docker-image itself to make an immutable image for the Helm Chart to consume. It does not make sense to install system dependencies in docker and install Python via Helm Chart. For migration, users can just install those deps in the docker file here. So we have a clear migration path. Maybe we can add docs for it.

  1. support for git-sync with HTTP auth (I may be wrong on this one)

We do support that, in fact we allow users to use any git-sync parameters via Environment Variables. Please correct me if I am missing something

Note, I'm open to contributing to the "official" chart to help move this process along, I'm just not sure how best to start.

Would love to have a Migration guide in both places -- https://airflow.apache.org/docs/helm-chart/stable/ as well as this repo itself. PRs are very welcome

@andormarkus
Copy link

@kaxil Does airflow support updating the Secretes Backend from dag? Based on my limited experience I can not update the value like I can do on a "local" variable.

@kaxil
Copy link

kaxil commented May 28, 2021

@kaxil Does airflow support updating the Secretes Backend from dag? Based on my limited experience I can not update the value like I can do on a "local" variable.

Yes if using the "built-in" backends like Metadata DB. No, if using an external Secrets Backend like Hashicorp Vault. We added the concept of "Secrets Backend" in Airflow to allow users to Manager "Secrets" in an external system that is dedicated to just secrets like Vault which allows rotation of secrets, allowing read-only mode so sys-admins can control modification of it.

If you want to just use a "variable" to modify in the DAG why are you setting it as Variable? You can just use an Environment Variable inside your DAG to override the Value.

BTW while the secrets the righ-fully read-only -- if you have permissions to edit those external systems -- you could use VaultHook (same as other Hooks) to modify the secrets: https://airflow.apache.org/docs/apache-airflow-providers-hashicorp/stable/_api/airflow/providers/hashicorp/hooks/vault/index.html#airflow.providers.hashicorp.hooks.vault.VaultHook.create_or_update_secret

@MarkusTeufelberger
Copy link

This is definitely a NO-go for the official Chart. All the dependencies installation (system + Python) should be done in the docker-image itself to make an immutable image for the Helm Chart to consume.

That's not encouraging to hear, since this means that the official chart is not useful for me at all. I do NOT want to maintain a build infrastructure + Docker registry just to do a pip install foo_dependency on top of an upstream container. There are other ways to solve this issue (init containers for example), but if "all dependencies need to be baked into the image from the get-go" is a hard requirement for the official chart, then the official chart is dead in the water for me.

Maybe you misunderstood how extra package installation works in this chart here? There's really not much difference between git sync and pip installation for example.

@potiuk
Copy link
Author

potiuk commented May 30, 2021

That's not encouraging to hear, since this means that the official chart is not useful for me at all. I do NOT want to maintain a build infrastructure + Docker registry just to do a pip install foo_dependency on top of an upstream container. There are other ways to solve this issue (init containers for example), but if "all dependencies need to be baked into the image from the get-go" is a hard requirement for the official chart, then the official chart is dead in the water for me.

I think this is indeed up to discussion on how to answer some of the needs. I don't think the way in this helm chart is good. It has it's own challenges, but maybe there are other ways similar behavior can be achieved.

I agree with @kaxil that the image should be immutable and this is specifically why I have designed the official docker image to be both - extendable and customizable (and I plan to add one or two more features for example automated installation from requirements.txt if they are placed in docker-context-files - similarly to ON-build feature of astronomer's image. This should definietely be default way how people should be using the images in K8S environment, and it's the only one that is sustainable at scale. I cannot imagine building and reinstalling images when you have distributed cluster, where each cluster would effectively rebuild the image potentially leading to different images at diffferent clusters. This is a NO-go for any "serious" usage of Airflow.

However I see that for hobbyist usage or in case when you just want to try things out and test the waters, an easier way of installing the packages, can be achieved.

And you do not need even helm chart specific support for that. I see that more of answering similar needs as the Quick-Start of Airlfow is done: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html.

We can easily built in a custom init_script that can actually perform the installation of requirements.txt file if it is present in a specific location whenever the image is started, also we could allow for mounting a cache of those installed dependencies automatically to speed up restarts of such dependencies.

As of recently we have the way to add custom init containers in the helm-chart, and we could also easily add such init script to the image and describe how to enable it for your helm chart if you want to do it. We could even have it as an option (clearly marked as development/test one) in the chart itself to use such init containers and clear path on how to migrate to a "serious" solution (i.e. how to easily build your own image using the very same requirements.txt using the mechanism I planned to add to the docker image). But I think just having an entrypoint and a possibility to mount it in the helm-chart with generic mount is good-enough.

@MarkusTeufelberger -> would that work for you and @kaxil - WDYT?

@kaxil
Copy link

kaxil commented May 30, 2021

@MarkusTeufelberger git-sync runs as a side-car and it won't affect running Airflow where as with initContainer and pip install we are talking about changing/updating dependencies. That being said, it is possible for you to achieve the same with the official chart.

The official chart allows adding extraInitContainers (and extraContainers) and Webserver, Worker and Scheduler.

https://github.com/apache/airflow/blob/4bdf46d2284c06d82a2574831028cbddc520ca8f/chart/values.yaml#L408-L411

So you can run your pip install command over there and use configmap for requirements.txt if you'd like. You can even further use pip constraints file or even piplock file like @gsemet mentioned below.

What we don't want to do is "not recommend" doing that for production usage. I think the official Chart is now highly extensible, so if you know what you are doing, by all means, you can add all many more things in those extraInitContainers as you like. But we would want to avoid users who don't know the best practises.

Hopefully that will take care of your case. We can even add docs around that for "hobbyist" and make it clear that do not use it for Production -- that should hopefully take care of case @potiuk mentioned.

@gsemet
Copy link
Collaborator

gsemet commented May 30, 2021

To ensure a full reproductibility of the dag, the good option would be to use a lock file (using pip tools for instance). Because I agree, burning the dag image is costly for users, and relying on a pip install during the initialization of the container does the job if all dependencies are locked in a way or another.

@potiuk
Copy link
Author

potiuk commented May 30, 2021

To ensure a full reproductibility of the dag, the good option would be to use a lock file (using pip tools for instance). Because I agree, burning the dag image is costly for users, and relying on a pip install during the initialization of the container does the job if all dependencies are locked in a way or another.

That's exactly what requirements.txt is supposed to do. PIp tools are not needed for that, Actually with recent changes with the way how resolver works and some nice features of pip, I think the needs for pip tools or poetry kind of solutions are gone. They used to be helpful, but in airflow ecosystem we sticked with pip and with PIP 21 maturing we strongly recommend just using PIP for all things around python dependencies.

Example requirements.txt:

library==0.5.0

You can even use all kinds off ~= or > if you want.

With 'pip freezeyou can even get therequirements.txt` for all your transitive dependencies as well.

@gsemet
Copy link
Collaborator

gsemet commented May 30, 2021

Usually you put loose dependencies in requirements.txt because updating it is quite boring. Pip-tools does the pip freeze for you.

the best would be to use poetry/pep517.

I do not use pip directly even the latest one, I don’t see how this would change anything if you do not have another “lock” file.

i do not speak about things I don’t know, I just advise about things I do everyday on dev environment that have heavy reproductibility requirements.
you do not want your install to fails when a new bugged package your require on is published on poly.
Poetry works. Pipenv also but it is substandard compared to poetry. Pip-tools to freeze a requirements.txt.in to requirements.txt also works. Don’t know if pip can handle a lock file natively from now.

@potiuk
Copy link
Author

potiuk commented May 30, 2021

Don’t know if pip can handle a lock file natively from now.

Yes it can. We use it all the time in airflow:

Simply run pip freeze > constraints.txt - it will produce a requirements/constraints file with all dependencies containing == for all dependencies (including transitive ones). The poetry .lock is equivalent to that, However poetry lock is more akin to "install_requires" in setup.py than constraints. And till today neither poetry nor pip tools are good for Airflow way of handling dependencies.

Let me explain where I am coming from - so that you could understand the context better - I am not just dismissing pip tools or poetry - for some usages they might have nicer interface but they are not good in case of Airflow.

Poetry (and pip tools) are - unfortunately - very opinionated in the way they work and do not handle well the way Airflow works - because Airflow is both library and application. Both poetry and pip tools made conscious decision to handle those two cases separately - either "open dependencies" for library or "fixed" dependencies for application. I used to believe this is correct (and it is in most cases) until I entered Airflow world ~ 3 years ago. Back then I also wanted to use poetry but it turned out it is not possible and it is not even today.

Airflow (as application) has to have "golden" set of constraints (that allow it to be installed even if new dependency versions are released) but it should also (as library) allow to upgrade those dependencies when new versions are released without upgrading Airflow itself (so that people could write DAGs using newer versions of the dependencies) . Airflow when installed with all providers has ~ 500 dependencies (including transitive ones) and it makes it one of the most complex, when it comes to dependencies, projects in PIP. That's why we generate and use constraint files to install Airflow and the only tool of the three (pip/pip tools/poetry) - only PIP can handle constraint files.

Another project using constraints in similar way to ours is openstack https://github.com/openstack/requirements/blob/d0b389d840c75216ab2cc10df849cb98990b1503/tox.ini#L10

There are two open issues currently in Poetry (partially inspired by the way how Airflow and Openstack need the constraints):

The original requests mentioning Airflow (closed long time before I commented on it) is here python-poetry/poetry#3225 and I explained there why constraints are needed.

Until constraints are supported by other tools, I can only recommend PIP, unfortunately.

@potiuk
Copy link
Author

potiuk commented May 30, 2021

BTW. We have a work in progress to move to PEP517 apache/airflow#16144 (while keeping PIP/constraints).

@gsemet
Copy link
Collaborator

gsemet commented May 30, 2021

Ok, if your answer is pip freeze > requirements.txt, you don’t know what poetry and pipenv do and more generally what lockfile are for. They are here especially because pip freeze is not enough.

there is nothing special about airflow dependencies. Basically every serious Python application needs to freeze the requirement somehow. I manage a dozen apps at work with real, reproductivity requirements, and even pep517 is not enough actually.

Pip freeze is just the first level of doing it, and I can bet you (your airflow user) will do it wrong at one point.

Pip-tools will do it slightly better from the user workflow (ie, requirements.txt.in with loose requirements and requirements.txt with the result of pip freeze), then you can use pep517 process like airflow. I got kicked out of my proposal to integrate a lockfile mechanism in airflow back when pep517 wasn’t released yet, and I am kind of sad to see it is still a thing in 2021, where we have an official and efficient way to defining, updating and freezing dependencies like poetry and more generally pep517.

in short, if the Python dependencies can be managed by a lock file in the dag (frozen requirements.txt, Pipfile.lock or poetry.lock), this would be equivalent of burning the deps in docker image, from the reproductibility point of view. So yes, installing dependencies from a requirements.txt is a good thing to have, but if you want your user not to do big mistakes, use pip tools or poetry.

@potiuk
Copy link
Author

potiuk commented May 30, 2021

Can you please explain how you would want the user to install airflow from PIP with the fixed set of requirements using poetry and allowing the same user to upgrade the dependencies later?

I would really love to see that.

Please.

@kaxil
Copy link

kaxil commented May 30, 2021

Yeah that sounds good (but it will also install packages evertime pod is run). @MarkusTeufelberger just to reiterate installing pip packages via initContainer is a BAD solution, it means your packages are needlessly installed everytime the pod is restarted, that is BAD. Convenience does not mean a GOOD solution. Example Issue: #104

This is why installing them once in your docker file is a good solution. Anyone debugging tons of Production issues will tell you that. Please trust us on this if you can.

What I proposed in the last comment is because you know what you were doing and wanted to continue doing it. We don't want users to do that unless they know.

@potiuk
Copy link
Author

potiuk commented May 30, 2021

Yeah. Fully agree @kaxil and just to reiterate @MarkusTeufelberger - I cannot imagine this to be used in production. It has a ton of problems when running at scale - the big drawback is that when it grows, it is much slower to bootstrap. One thing for example that you might face soon will be automated, uncontrollable upgrade of airflow which you might not realize. New providers (will be released this week likely) will be 2.1+ only - which means that if you install a new provider this way, it will .... upgrade airflow together with it if you happen to use airflow 2.0. Those are the kind of problems that you DEFINITELY do not want to find out about when your container start already and operate on live database. That's why building your own image is a much better solution.

This change is similar to what was used in the helm chart of @gsemet and @thesuperzapper and it is only useful for quick testing (in the doc update in the PR I made it very clear that production-ready solution is to build your own image).

But also it is a bit "better" than the helm solution:

  1. no need for init-containers
  2. no problems similar to those experienced in install-pip-packages contrainer fails to start (airflow 2.0.1) #178 (solved by b2eb0d9)
  3. faster bootstrap than the solution above - the solution for install-pip-packages contrainer fails to start (airflow 2.0.1) #178 required to copy installed dependencies and mount the copied folder over from the init container to the main one
  4. works in the same way for our quick-start Docker-Compose.

@MarkusTeufelberger
Copy link

To be honest, all I need is a mysql client library and a helper API for a REST-API of a proprietary product. I'll keep your warnings in mind if I actually need to install anything with airflow in its dependency graph.

This is why installing them once in your docker file is a good solution.

Unfortunately usually Dockerfiles are not deterministically built and unlike docker-compose, Kubernetes does not have a native path from Dockerfile --> Docker image --> running pod so it is up to every user to come up with their own build process for this scenario.

Forcing (or even recommending) every user to continuously maintain a fork of an official image is a BAD solution in my books. #104 can be solved by installing dependencies into a volume and attaching that to pods instead of using init containers, but that's already a solution suggested there. The thing I want to have is an unmodified upstream container that I do not have to modify on each release with additional packages my plugins need installed and available.

For the record, installing dependencies at pod startup time is also far from what I would see as an ideal solution to this problem ("I need libraries foo and bar that are not part of the official image available on my DAG runs"), but it at least doesn't place a heavy burden on the operator of the infrastructure. Building your own images is definitely a solution to this problem (and comes with its own issues). The cost for doing that (maintenance, infrastructure) is nontrivial though.

@potiuk
Copy link
Author

potiuk commented May 31, 2021

Forcing (or even recommending) every user to continuously maintain a fork of an official image is a BAD solution in my books. #104 can be solved by installing dependencies into a volume and attaching that to pods instead of using init containers, but that's already a solution suggested there. The thing I want to have is an unmodified upstream container that I do not have to modify on each release with additional packages my plugins need installed and available.

Just FYI. No fork is ever needed. The official Dockerfile is super versatile precisely for the reason that you should not have to fork it. It's specifically designed to handle two scenarios though:

  1. easy extending the image via "FROM: directive" (this one uses unmodified airflow upstream as a base).
  2. more complex and flexible customizing the image via providing build-args, requirement files and custom packages

See the https://airflow.apache.org/docs/docker-stack/build.htm

The problem with Airflow is that you cannot have single 'binary" upstream image to handle all cases - because airflow deployment has so many varants (more than 60 providers you can choose from) + many people have their own custom requirements.

The case 1) is super simple, you just have your own simple Dockerfile with FROM: apache/airflow clause and you can add your own dependencies there. True, you need to have a process to maintain your "custom" additions and store the image, but if you are getting into trying to use Kubernetes, this is what you already committed to from day one.

The basic premise of K8S deployment is to use internal registry and images. You won't escape it no matter how hard you try. Each K8S installation already has internal image registry. It's part of K8S architecture. And if you have locally built image you do NOT need additional registry. You simply build image locally, push it to the registry of K8S and that's it. Same as you install Helm Chart updates. This is just another command to run. You load your locally built image to K8S internal registry and it is immediately available for all pods. No additional registry is needed.

Maybe that's where your "non-trivial" understanding is from is that you think you need something else. No you don't. You already have everything you need the moment you have K8S installed.

For example when you use 'kind' as your K8S cluster, those are literally two commands:

docker -f YOURDOCKER_FILE . build -t <IMAGE_NAME> .....<whatever args you want to add> 
kind load <IMAGE_NAME>

That's it. You are done.

Every K8S cluster deployment has similar capabilities. I am tempted to name it 'trivial" actually.

Case 2) is more complex but also more powerful. The official image with "customization" is WAY more versatile. And the whole premise with it is that you can build your very custom image without forking it at all. It supports out-of-the-box:

  • optimizing the image for size heavily using multi-staging approach (development dependencies are not present in the final image)
  • installing using custom PYPI configuration (using .pypirc) -> with own in-house repos, proxies, authentication whatnot
  • adding extra dependencies - both apt and PYPI at various stages (you can for example install optimized version of packages that require compilation step without unnecessary increase in size of the image)
  • choosing set of airflow extras/providers that will be installed
  • choosing custom constraint files where you can manually keep versions of dependencies to install and add your versions
  • capability of building the image in air-gaped environment (where you are not connected to internet, you only use custom apt sources via proxies and locally prepared and security-vetted python packages) - this is a requirement for some high-security environments like banking or military to mitigate supply-chain attacks

All this is available in the image without the need of keeping the fork. You just have to place your own custom files in "docker-context-files" and pass the right build arguments.

Does it look like, you can do better than that? I seriously doubt. If you would like to use K8S, I think you should simply use it as intended. The dynamically updating image while running is simply abomination. Something that is unnecessary and harmful.

@MarkusTeufelberger
Copy link

Each K8S installation already has internal image registry. It's part of K8S architecture. [...] This is just another command to run. You load your locally built image to K8S internal registry and it is immediately available for all pods. No additional registry is needed.

It seems you're mixing up a local image cache of some CRI implementations and a container registry? Otherwise, please point to the documentation of the internal registry component on https://kubernetes.io/docs/concepts/overview/components/

Every K8S cluster deployment has similar capabilities. I am tempted to name it 'trivial" actually.

Mine don't even use docker as CRI and support for dockershim has been removed from kubernetes already. Yes, I can build containers in there using other means (my personally preferred one would be kaniko), but a I said that's a personal choice, not something that comes with Kubernetes.

you can build your very custom image without forking it at all

By "forking" I meant "taking on the responsibility of testing changes and deviating from upstream builds + release artifacts". "Custom image" kinda implies that. Option 1 just takes your version as the base, Option 2 means I need a completely custom image build process that might or might not work at all with no way of getting any support from upstream. It means that I either add a layer on top of the official image with my libraries or I completely rewrite the way the image is built using the way the official one is built as a guideline. Both will result in an image that's 100% unique to my process and that will need a rebuild whenever upstream releases either a new image or software version. There's no documentation on how to do that in place (because it concerns something outside of both helm and airflow) and everyone who wants to do this correctly will have to invent their own process to get there.

https://github.com/apache/airflow/blob/f47e10c3885a028e7c45c10c317a7dbbff9e3ab9/Dockerfile#L326 alone already makes your released image no longer reproducible easily and if someone goes for your "case 2" approach, they can run into real trouble.

Every K8S cluster deployment has similar capabilities.

I really wish that they would. It would make a lot of things easier. Now try your example with a more production-like cluster deployment such as https://www.talos-systems.com/platform/talos-os-for-kubernetes/ - good luck.

Does it look like, you can do better than that? I seriously doubt. If you would like to use K8S, I think you should simply use it as intended. The dynamically updating image while running is simply abomination. Something that is unnecessary and harmful.

Passive(?)-aggressiveness aside, another alternative would be to vendor libraries with custom plugins (check whole libraries with the plugins that use them into git in a state that they can be imported from that folder), so they are "installed" using the git-sync mechanism rather than upon startup of the pod. It is something that I am not too fond of as a solution, but it would also mean that there's a versioned and stable instance of a dependency available that can be managed outside airflow itself. I'm just not sure if that's a better (or intended) way to add dependencies while sticking with unmodified upstream images.

@potiuk
Copy link
Author

potiuk commented May 31, 2021

Mine don't even use docker as CRI and support for dockershim has been removed from kubernetes already. Yes, I can build containers in there using other means (my personally preferred one would be kaniko), but a I said that's a personal choice, not something that comes with Kubernetes.

Of course. If you watch my "Production Docker Image" talk from last year's summit - https://www.youtube.com/watch?v=wDr3Y7q2XoI it's where I specifically talked about "containers" vs "docker" and try to pass the message. But - I failed and went back to using Docker as name. As unfortunate as is is, seems that (similar as in other cases) Docker and especially "Dockerfile" that does not have other name ("Containerfile"???) remained as "de-facto" name for anyone using containers.

I already stopped correcting people who are using Docker when they could use containers and corrected our documentation back because people were confused. Docker is so deeply "embedded" in the whole process that people will not understand if say "container". I advise you the same (even if it's not technically correct) as it makes it easier to pass the message.

But yeah. You can use kaniko, podman or others to build the image. That was just an example. No more, no less.

It seems you're mixing up a local image cache of some CRI implementations and a container registry?

Correct. Thanks for pointing this out. This is the internal cache I referred to. Indeed it's not a fully-fledged registry. Which does not change the fact that you do not need external registry and you can simply load the image - especially when you have K8S installed for "test" purpose (Again - I cannot imagine anyone trying to dynamically update their image in production in "fully-fledged" cluster. You can contest that and dislike it, but this is simply how it works. If you chose K8S for deployment, I think you should live with consequences of the choice (or maybe try to change it by submitting proposals to K8S or by changing K8S to something else).

Both kind and minikube support the "load" command to make your image available for the cluster (in the cache, not registry as you rightfully corrected me).

I really wish that they would. It would make a lot of things easier. Now try your example with a more production-like cluster deployment such as https://www.talos-systems.com/platform/talos-os-for-kubernetes/ - good luck.

Here is the way how you do it in Talos: https://www.talos.dev/docs/v0.7/guides/configuring-pull-through-cache/#using-caching-registries-with-docker-local-cluster  - it is actually even easier, rather than loading the images to the cluster, you configure Talos to load the images from local docker cluster cache. So all you need to do is to build the image. Can be used on Air-Gapped systems and will never reach out to any registry. Very handy actually. No registry needed. Which was my point actually.

Passive(?)-aggressiveness aside

Nope. I am just straightforward. K8S and docker were never meant to add extra "software" layers after images are built. Trying to do that "live" is abomination and I am happy to repeat that. Syncing/installing packages using Git-sync is a terrible idea as well. Container "layers" are the way to add new software on top of existing layers. This is how Containers work. This is what they were creating it. By installing software additionally inside Kubernetes, rather than adding them outside (by building containers) is terrible idea as well. Using Git to synchronize packages is really bad idea. If you have DAGs (which are source code), yeah - using Git-sync is great idea. But using it to synchronize binary packages to install, is - again - abomination. And I am not afraid of that word.

Both will result in an image that's 100% unique to my process and that will need a rebuild whenever upstream releases either a new image or software version. There's no documentation on how to do that in place (because it concerns something outside of both helm and airflow) and everyone who wants to do this correctly will have to invent their own process to get there.

Documentation for the process is here: https://airflow.apache.org/docs/docker-stack/build.html#customizing-the-image

It's rather comprehensive but imperfect (and we are improving it continuously and make the process simpler and more stable.
This is precisely why I waited with making the image "official status". AIP-26 is still not closed because of that. There are two tasks left (because I wanted to iterate with the users of the image to work out simple yet powerful ways of building the image that will be stable in the future as well. See https://github.com/apache/airflow/projects/3.

The two steps left:

  1. Have a separate (automatically generated) repository where users will only get Dockerfile + needed fles (not the whole Airflow) in order to be able to customize the image

  2. Submit the image to "DockerHub" to gain the "official" image status.

And I am going to add full backwards compatibility checks when this happens.

@potiuk
Copy link
Author

potiuk commented May 31, 2021

And a comment here:

https://github.com/apache/airflow/blob/f47e10c3885a028e7c45c10c317a7dbbff9e3ab9/Dockerfile#L326 alone already makes your released image no longer reproducible easily and if someone goes for your "case 2" approach, they can run into real trouble.

Yes. Full reproducibility is a non-goal for me (and for the image). The much more important feature for me is that whenever we build the image from scratch we get the latest security fixes with it. Not a 'reproducible binary copy' of what was there before. Plus all our build process has automated tests that test the built images - we have a sequence of automated tests in CI with every single Pull request that we run whenever we build the image, which checks a number of things. And yes this is a deliberate decision.

@talnicolas
Copy link

Just a bit curious since I see that the official helm chart is going full Astronomer and we ran a POC a few months ago, I would like to check if one really problematic thing has been fixed since then in the deployment process.

Since you build an immutable image for all the components at each change, whenever we were doing any change, being on the scheduler, webserver or worker, all the components were being redeployed/restarted. It was not that much of an issue losing the UI, just bad user experience in my opinion (at least the rocket was cool looking), but having the scheduler being restarted and not scheduling any task each time you do a change that is not even scheduler related (just to deploy a new DAG!!) for me was a real aberration and I don't understand how it makes sense in a production environment with time critical workflows.

Is it still the behavior proposed by the official helm chart today or has a different strategy been put in place @potiuk @kaxil ?

@kaxil
Copy link

kaxil commented Jun 10, 2021

re: Official Helm Chart

@talnicolas The Webserver does not restart for the official Helm Chart we have a RollingUpdate for Airflow >= 2.0.0 since we have made DAG Serialization a requirement from 2.0.0.

https://github.com/apache/airflow/blob/9351e2a2d43b6ec3c019df19afb445a3dcf587ac/chart/templates/webserver/webserver-deployment.yaml#L39-L55

You can also use git-sync for DAG deployment or persistent Volume -- so DAG deployment is separate.

re: Astronomer Platform & restart

The Webserver does not restart over there too anymore for Airflow >= 2.0.0 :) It also uses a rollingUpdate, we have more support now for DAG deployment too, Astronomer supports Volume Based deployments so no longer Scheduler or worker restarts are required :) Do reach out if you want to try it again.

@potiuk
Copy link
Author

potiuk commented Jun 11, 2021

but having the scheduler being restarted and not scheduling any task each time you do a change that is not even scheduler related (just to deploy a new DAG!!)

Also to add to what @kaxil wrote and relate to your comment, I think deploying DAGs by running "helm upgrade" command to deploy new DAGs in production sounds super-weird to be honest.

Helm Chart should be used to deploy "application" not to deploy another version of DAGs. DAGs are not "part" of airflow application. There are other methods - specifically git_sync that @kaxil mentioned which is also part of the official helm chart - there deploying new dags happens in the background without restarting pods.

@thesuperzapper thesuperzapper added kind/question kind - user questions and removed kind/enhancement kind - new features or changes labels Jun 23, 2021
@cocampbe
Copy link

cocampbe commented Jun 29, 2021

All I can say is, "Ugh". After all this time an 'official chart' is created. Those of us who wanted to deploy to k8s sooner than later found this chart and have been using it in production for quite sometime. I agree with many of the arguments provided. I currently use the pip install of extra packages for differing providers. It works for me. It does take a while for the pods to start. Maybe 23 seconds. But after that, it's running.

The argument that a image is immutable is not a strong one IMO. Images are immutable. But a running pod/container is not. Whether it's baked into the image, or not, is debatable. And I think both sides can provide some really good arguments. Personally, I think users should have some flexibility being that we are discussing OSS. The fact is @thesuperzapper provided a solution that was pretty much abandoned and now there is a new release. I feel like it would have been better to approach @thesuperzapper many moons ago to see if he wanted to collaborate on an official release. Now there is schism, and it will likely be difficult for some of us to migrate to the official release. Color me super excited with trying to figure out how to migrate if we choose to.

@MarkusTeufelberger
Copy link

apache/airflow#16170 is the one that adds the pip install at pod start feature to the "official" chart, so that should already be fixed in the most recent release or an upcoming one (a bit hard to say, as https://github.com/apache/airflow/releases is a mess and it is hard to find anything there).

@potiuk
Copy link
Author

potiuk commented Jun 29, 2021

I feel like it would have been better to approach @thesuperzapper many moons ago to see if he wanted to collaborate on an official release. Now there is schism, and it will likely be difficult for some of us to migrate to the official release. Color me super excited with trying to figure out how to migrate if we choose to.

Diversity of solutions is a value on its own. There is NOTHING wrong with having two solutions which are parallel. We discussed it many moons ago @cocampbe - you can search the airlfow Devlist and we decided to go with the Astronomer-originated solution officially supported by Airflow community (please search the archives). And it will continue to be so, including 3 people from Astronomer working hard few last months to get it super-robust, handling many cases and most of all fully end-2-end-tested (including full integration and unit test support - which is a HARD requirement for everything we do as Airflow Community. Just go and see how "robust" and "future-proof" it is.

And we are not ad "odds" with @thesuperzapper. We actually collaborate to some extent - including @thesuperzapper hosting Melbourne chapter of Airlfow Summit. Heck - if you read the thread from the very beginnnig, we even offered our way in helping people to figure out ways to migrate to community chart and see whether we can figure out some "other" ways of handling their needs if we don't agree with the way it's done in this chart.

The PR #16170 that @MarkusTeufelberger mention is a proof of that. While we think using "pip install" for production-grade image during PIP start is plainly wrong, we recognise the "development" and "test" value of it, so we added it (as _development-targeted option with a clear warning printed to the users that it should not be used in production (which you can of-course ignore but then you will not get community support for any issues that might result from ignoring it).

It's even more than that in fact. Because we not only enabled this development mode for Helm users, but also for Docker-Compose users (because we've implemented it in the official Airflow Image, not in Helm nor Docker-Compose).

And BTW. this is totally fine if you pick the @thesuperzapper' s Helm. Feel free. No-one is forcing you to use the official Apache Airflow-Community managed one. You just have to be prepared that Apache Way driven Airflow Community will not support your problems if you have them and the community built by the @thesuperzapper around the other chart will do it.

It also means that we will make different choices and decisions (this is one of the reasons we decided to go with our own approach because we did not agree with some design choices in the helm chart of @thesuperzapper). And there is nothing wrong with that. It's OK to have different opinions and make different decisions as long as you discuss it and weight those other options in,

Just let me repeat it again - we are not Helm-centric, neither even K8S-centric. We are Airflow-centric. Helm for us is only one of the ways how Airflow is deployed and it's very narrow view for us to limit the choices we make to only that part of our users. That what is big part of our "product" thinking. Our approach has te be much broader that choosing one deployment option over the other. I understand it "looks" like the pip install solution is better from your perspective but this is a very, very narrow (and I could even say in some way pretty 'selfish') view. We have to consider and take into account opinions, needs, expectations of many of our users and build a consistent product around that - sometimes making choices that makes some people a bit less happy than they could otherwise. And we run and analyse a Survey among Airflow users to be better informed on that.

If you wish to learn more about that - why we are doing it, how we took into account result of Airflow Survey that we run every year and what is the philosophy behind it, I heartily invite everyone who watches the discussion to watch our Airflow Loves Kubernetes talk with @kaxil at the Airflow Summit in two weeks. Some decisions we've made will be easier to understand then.

@potiuk
Copy link
Author

potiuk commented Jun 29, 2021

as https://github.com/apache/airflow/releases is a mess and it is hard to find anything there).

@MarkusTeufelberger - I think you have that impression,, because you are looking in wrong place. "Releases" is simply a snapshot of whatever we already released (including providers). This page is automatically prepared by GitHub and with the amount of releases we have (including providers) it does look overwhelming. That's why in the official Airflow documentation we publish the Changelog which is much cleaner : https://airflow.apache.org/docs/apache-airflow/stable/changelog.html and each provider has it's own separate changelog (for example Amazon's one is here: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/index.html#changelog )

However, If you want to see what's coming. you have to look at the milestones:
https://github.com/apache/airflow/milestones where you will see it neatly organised around 2.1.1 (due to be released today), 2.1.2 (soon), 2.2 (which is coming in few months) and 3.0 (which is empty for now but after the summit you will see more stuff planned there, I am sure).

@cocampbe
Copy link

@potiuk I have apparently poured salt on a wound and I apologize for that. It's just frustrating.

@potiuk
Copy link
Author

potiuk commented Jun 29, 2021

@potiuk I have apparently poured salt on a wound and I apologize for that. It's just frustrating.

No salt :). No wound :). Just wanted to provide the full context. It's just my way of over-communicating when I see that some assumptions simply miss some bigger context. Really I am not frustrated or angry if you read it this way. I think if you visit devlist and issues more often you'll see I am just trying to be helpful (and apologies if it looks a bit 'patronising', I know it might certainly look like that but It's really not intended to be.

@thesuperzapper
Copy link
Member

For those following at home, I intend to maintain (and even add new features to) this chart as long as it's used. A little friendly competition often breeds innovation.

I have updated the README with a section highlighting that there are two charts, and outlining the approach that this "community" chart is taking.

However, let's make sure we refer to the charts by consistent names:

  • "community" - is this chart
  • "official" - is the chart found in the apache/airflow repo

@potiuk
Copy link
Author

potiuk commented Jun 30, 2021

To be honest, I think "community" is quite a bit misleading name and it introduces a lot of confusion.

Everything Apache Airflow releases is a product of the "Apache Airflow Community". And at the Summit's talk Airflow Loves Kubernetes we are going to talk with @kaxil that the chart is supported by the Apache Airflow Community.

So I think we are going to refer to it as "Official Airflow Community Chart" from now on to be very precise this is not a product of some mysterious group - but the very same community that produces Apache Airflow.

@potiuk
Copy link
Author

potiuk commented Jun 30, 2021

Hey @thesuperzapper - maybe we can propose another name for your chart that we can use, I really find it so confusing - several times I saw a question about the "community chart" in Slack and after asking for details (Which chart ?) I was pointed out to the https://airflow.apache.org/docs/helm-chart/stable/index.html . I really think people find it confusing because "Apache Airflow Community" is heavily used (and rightfully) to refer to the community build around Apache Airflow, which is built according to the Apache Way (Apache Software Foundation is the owner of Apache Airflow). And the "Community over code" is the ASF motto, so "community" as "distinctive" name is a very bad choice.

We could use 'Non-official" name but it is obviously negative (and there are other non-official charts like Bitnami one that is also often used). So maybe some other "name" that we can jointly refer to would be great? Let me think about some proposals.

@potiuk
Copy link
Author

potiuk commented Jun 30, 2021

Just to explain - I also seek the right name here as we want to talk about it at the talk as an alternative and since it's going to be recorded and likely popular (my last year's Airflow Prod Docker Image talk from last year got whooping 7K views), I think we need to come up with a consistent naming that will be used by everyone.

I am thinking about some "positive" and "distinctive" names. I really do not want to leave people with an impression that there is something wrong with the "other" chart.

  • It could be named by you @thesuperzapper - as you seem to be the author of almost all commits
  • Simply "airflow-helm" chart (following the github repo name name) - might be a bit confusing
  • "stable helm chart" - this is the name that people referred to for quite some time and still do (following the old name of the repo)
  • "historical chart" - that might be slightly negative as it might mean as it is "outdated"
  • "user-community chart" - similar to the "community" but with clear distinction from the "Airflow Community" which is more than users, they are also creators of Apache Airflow and various stakeholders

I like the last one most, because it's just a little modification of the current naming used by @thesuperzapper but one that brings clarity when it comes to distinction between the two, keeps the "community" part which is important for Matthew I think and is pretty informative.

WDYT?

@gsemet
Copy link
Collaborator

gsemet commented Jun 30, 2021

My humble opinion, as the original author of this chart: « user-community chart » is fine and not so far from what we have know.

@thesuperzapper
Copy link
Member

I agree that user-community is fine.

When introducing it it's probably important to highlight that it it used to be the stable/airflow chart, so people understand where it came from (but that will probably become less understood/relevant as time passes).

@potiuk
Copy link
Author

potiuk commented Jul 1, 2021

I agree that user-community is fine.

When introducing it it's probably important to highlight that it it used to be the stable/airflow chart, so people understand where it came from (but that will probably become less understood/relevant as time passes).

Cool. let's stick with that terminology then :). Surely - when we will talk about the Chart we want to tell shortly about the history and we want to make sure the credits are there for all the years when there was no Apache-Airflow-Community supported Helm Chart. I think many users might choose to stay with it and that's great if we can refer to history and explain the context, as it might sometimes (as in the discussion above) be not clear where the distinction originates from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question kind - user questions
Development

No branches or pull requests

8 participants