New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SLA wording to reflect it is relative to Dag Run start #27111
Conversation
bd02e56
to
d9d5be0
Compare
I don't think the build doc error is related to this change? At least as I'm interpreting the log output. Edit: Yup, repushing cleared the test failures. |
d9d5be0
to
8698ae4
Compare
@potiuk Whenever you get a chance, any objections? |
(cherry picked from commit 639210a)
(cherry picked from commit 639210a)
I am not advocating for re-opening this issue now, but I wonder if it might still lack clarity? Contrary to what one would intuitively expect, isn't the SLA relative to the next DAG run? airflow/airflow/dag_processing/processor.py Line 421 in cc4cde9
|
That looks like it's calling next only once? And isn't the fact it calls I'm happy for another PR to be opened for either even better clarity or updating this logic to be more consistent. |
Sorry, I should have quoted the entire code section: next_info = dag.next_dagrun_info(dag.get_run_data_interval(ti.dag_run), restricted=False) # 1st call
while next_info and next_info.logical_date < ts:
next_info = dag.next_dagrun_info(next_info.data_interval, restricted=False) # 2nd call
<SLA miss detection here> I agree that you need to call Actually, I'm running an older version of Airflow that detects SLA misses as follows: dttm = ti.execution_date
dttm = dag.following_schedule(dttm) # 1st call
while dttm < timezone.utcnow():
following_schedule = dag.following_schedule(dttm) # 2nd call
if following_schedule + task.sla < timezone.utcnow():
<SLA miss> I added some debugging I haven't tested with the new version though, but the logic doesn’t seem to have changed. |
I'm not sure I completely follow this wording, do you have a concrete example where this differs from the wording in my PR? The wording I have given matches the motivating example I give in #26566. And I'm less concerned about if the wording matches the steps of the code as long as the description matches the end result.
I'm not sure I follow what you're getting at but no this is not a fair assumption. I have worked with many DAGs that run daily but each DAG run may be for longer than one day (sometimes weeks). |
With my Airflow (1.14.15), the SLA miss was not detected in your scenario (#26566) until the second DAG run. I might have found the reason. Airflow 1.14.15 does check for the SLA one interval ahead of each task, but it does so for SUCCESS/SKIPPED tasks only, not the currently running tasks. When a new run is triggered, the SUCCESS/SKIPPED tasks from the previous run will be one interval behind, so looking one interval ahead works for these tasks. That’s when the SLA miss is triggered. Definitely later than I expected. |
Do you mean 1.10.15? Regardless this fits with the experience of SLAs I had with Airflow 1.x era, that is they came much later than I expected and I didn't find them useful Fortunately I currently only have to work with Airflow 2.x instances, and I have been setting up a lot of SLAs recently and my practical experience of them is they work as I've described in this PR. |
Yep. There is currently probably few (certainly less than 20) percent code left from Airflow 1.10 in Airflow 2. Airflow 1.10 is end of life, end of support, end of security fixes for 1.5 year. If you are using it still @rom1mouret - you are shooting yourself in the foot, because not only maintainers do not remember too much about those days (and most of all they want to forget), but also you are one of the few last people on Earth who use Airflow 1.10. It's not only dangerous for you and your employer (multiple Security CVEs that were already handled in Airfow 2 but they were not in Airlfow 1.10) but it also removes the biggest reason why you might want to use an open-source software - a community of users and maintainers that help you to solve your problems. You can spend your time in searching for reasons, but the whole effort there is only useful for you, no-one else really wants to have anything to do with 1.10 any more. You are bascally in this situation: |
Thanks. Can’t wait to upgrade! |
Update Docs to reflect reality of SLA
closes: #26566