New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple DagProcessors parsing files from different locations. #25935
Conversation
9dd1396
to
fb3298a
Compare
Not sure what is the state of releasing 2.4.0 - if we can't fit this PR then it may wait until 2.5.0 I believe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a global variable (which is what DagProcessorDirectory
is, how about:
Add an attribute to the DagFileProcessorProcess
constructor (which is passed down from the Manager), and add a dag_directory
argument to DagBag.sync_to_db
which can get passed down to DAG.bulk_write_to_db
and SerializedDagModel.write_dag
7c76fec
to
edab1fc
Compare
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@@ -1077,6 +1077,10 @@ standalone_dag_processor = False | |||
# in database. Contains maximum number of callbacks that are fetched during a single loop. | |||
max_callbacks_per_loop = 20 | |||
|
|||
# Only applicable if `[scheduler]standalone_dag_processor` is true. | |||
# Time in seconds after which dags, which were not updated by Dag Processor are deactivated. | |||
dag_stale_not_seen_duration = 600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dag_stale_not_seen_duration
--> any suggestion for a better name? this config name isn't easy understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh.. naming things...
Let me be wild on that one:
deactivation_time_for_missing_dags_in_standalone_dag_processor_mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion from @ashb was mark_dag_stale_not_seen_in
Is it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of them are awful :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually yours @potiuk helped me to understand the meaning of that parameter ;p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Horrible names can also be best :)
Support running multiple standalone DagProcessor each configured to parse dags from different directory.
Changes:
Usage:
Part of https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation