Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm Chart] Support multiple DagProcessors #26494

Open
1 of 2 tasks
jtvmatos opened this issue Sep 19, 2022 · 9 comments
Open
1 of 2 tasks

[Helm Chart] Support multiple DagProcessors #26494

jtvmatos opened this issue Sep 19, 2022 · 9 comments
Labels
area:helm-chart Airflow Helm Chart good first issue kind:feature Feature Requests

Comments

@jtvmatos
Copy link

jtvmatos commented Sep 19, 2022

Description

Airflow 2.4.0 has introduced support for Support multiple DagProcessors parsing files from different locations.

However, Helm chart only supports Provision Standalone Dag Processor (or at least will support in Airflow Helm Chart 1.7.0).

Use case/motivation

For Airflow Helm Chart users take advantage of these new AIP-43 features, it is necessary to add support for provision multiple DagProcessors in Helm chart.

Related issues

Helmchart: Provision Standalone Dag Processor
Support multiple DagProcessors parsing files from different locations

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@potiuk
Copy link
Member

potiuk commented Sep 20, 2022

Marked it as a good first issue. I think it would require some smart way of defining how to do it in the configuration. Even if you are not willling to implement it (or maybe?), a proposal from the user perspective how it could look like when it comes to configuration would be most welcome @jtvmatos . Any ideas and suggestions?

@jtvmatos
Copy link
Author

jtvmatos commented Sep 21, 2022

Hi @potiuk thanks for the feedback.

Unfortunately in the coming weeks I won't be available to create this PR (I will be unavailable for more than a month... if no one solves it by then, I can open the PR....)

But yes, this PR can be tricky and can create a breaking change in the current processor configurations. For example, creating a nested level for multiple DagProcessors could be an option:

# Airflow Dag Processor Config
dagProcessor:
  enabled: false
  # Airflow dag processors in the deployment
  processor:
    # Processor 1
    - name: processor-1 # Alias for metadata.name
      replicas: 1
      revisionHistoryLimit: ~
      command: ~
      args: ["bash", "-c", "exec airflow dag-processor"]
      strategy:
     (...)
    # Processor 2
    # - name: processor-2
    #   args: ["bash", "-c", "exec airflow dag-processor --subdir  ..."]

With this values.yaml, will be possible to set the processors configurations as currently used for env or secrets, i.e., dagProcessor.processor[0]..., dagProcessor.processor[1]... for N processors (>=2.4.0).

Alternatively, we can just add a extraDagProcessor list:

# Airflow Dag Processor Config
dagProcessor:
  enabled: false
  
   (...)
   
  env: []

  extraDagProcessor: []

This list is not breaking for the current settings. However, it almost imply a "master" processor and "other" processors.

@potiuk
Copy link
Member

potiuk commented Sep 22, 2022

No worries - we do not have to add it super-quickly I think (but if somoene will want to add it this is still an option :)).

I'd say the best approach would be to make the top-level dag processor entry optional (and fail the configuration if both "arrays" and "top level" are specified (then we could name the array "multipleDagProcessors" simply?). WDYT @jedcunningham @mhenc ?

@jtvmatos
Copy link
Author

jtvmatos commented Sep 22, 2022

I create a fork, with the approach that I mention before (only for my local tests):
-> 0793c55

For now this is all I can do. But it can be useful for the discussion :)

Thank you all!

@jedcunningham
Copy link
Member

We have a similar problem for celery workers too. I haven't given it a ton of thought, but I feel something like the breaking change as proposed above will be a better solution for us. That does mean we need to wait for version 2 of the chart though.

Either way, removing it from the existing milestone for now.

@jedcunningham jedcunningham removed this from the Airflow Helm Chart 1.7.0 milestone Oct 3, 2022
@jtvmatos
Copy link
Author

jtvmatos commented Oct 4, 2022

Hi @jedcunningham,

Should we then follow a approach like @potiuk suggested? Something like:

# Airflow Dag Processor Config
dagProcessor:
  enabled: false
  # Number of airflow dag processors in the deployment
  replicas: 1

  (...)

# Airflow Multiple Dag Processor Config
multipleDagProcessor:
  enabled: false
  processor:
    # Processor 1
    - name: processor-1
      replicas: 1
     (...)
    - name: processor-2
      args: ["bash", "-c", "exec airflow dag-processor --subdir  ..."]
     (...)

With a validation that only one (dagProcessor or multipleDagProcessor) can be activated, and a Note that in Helm Chart v2 both configs will be merge. WDYT?

@eladkal
Copy link
Contributor

eladkal commented Jan 26, 2023

@jtvmatos I think it's better to open a draft PR and discuss code changes over the PR

@eladkal eladkal added the area:helm-chart Airflow Helm Chart label Jan 26, 2023
@apinchuk1
Copy link

What is the current status of this? Looking forward to the feature of having multiple dag folders within the helm chart.

@potiuk
Copy link
Member

potiuk commented May 15, 2024

What is the current status of this? Looking forward to the feature of having multiple dag folders within the helm chart.

If you would like to contribute it @apinchuk1 , this this is the most certain way of making a "status change" on that. Other than - someone will have to contribute it, so there is no "status change".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart good first issue kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

5 participants