Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the incorrect scheduling time for the first run of dag #21011

Merged
merged 1 commit into from Feb 6, 2022

Conversation

wanlce
Copy link
Contributor

@wanlce wanlce commented Jan 21, 2022

When catchup_by_default is set to false and the start_date in the dag is the previous day, the first dispatch time of this dag is incorrect

dag setting

schedule_interval='30 21 * * *',
start_date=dates.days_ago(1),

2022-01-21_18-31


error
2022-01-21_17-27

Correct
2022-01-21_17-30


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@wanlce wanlce requested a review from uranusjr as a code owner January 21, 2022 10:37
return max(new_start, earliest)
return max(new_start, self._get_next(earliest))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test to ensure if earliest lies exactly on the schedule, the first run is not one interval late. My intuition is this should use self._align instead, but a test would be required to verify that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use anlign. 😢

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks good to me, but the tests need some improvements. Also need to fix linter errors.

tests/timetables/test_interval_timetable.py Outdated Show resolved Hide resolved
tests/timetables/test_interval_timetable.py Outdated Show resolved Hide resolved
@wanlce wanlce force-pushed the fix-first-dagrun-time branch 2 times, most recently from d21a721 to af772ad Compare January 25, 2022 02:49
@wanlce wanlce requested a review from uranusjr January 28, 2022 02:29

HOURLY_CRON_TIMETABLE = CronDataIntervalTimetable("@hourly", TIMEZONE)
HOURLY_TIMEDELTA_TIMETABLE = DeltaDataIntervalTimetable(datetime.timedelta(hours=1))
HOURLY_RELATIVEDELTA_TIMETABLE = DeltaDataIntervalTimetable(dateutil.relativedelta.relativedelta(hours=1))

CRON_TIMETABLE = CronDataIntervalTimetable("30 16 * * *", TIMEZONE)
CRON_INTERVAL = datetime.timedelta(minutes=30, hours=16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CRON_INTERVAL name is confusing; the “interval” of the timetable is actually 24 hours (because jobs run everyday at 16:30). Let’s change it (although I’m not sure what name is appropriate).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you are right. How about using INTERVAL_FROM_ZERO?:smiley:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe DELTA_FROM_MIDNIGHT (zero sounds wrong)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Great! Use it

When Catchup_by_default is set to false and start_date in the DAG is the
previous day, the first schedule time for this DAG may be incorrect
@wanlce
Copy link
Contributor Author

wanlce commented Feb 2, 2022

Happy New Year!

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Feb 2, 2022
@github-actions
Copy link

github-actions bot commented Feb 2, 2022

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@uranusjr
Copy link
Member

uranusjr commented Feb 2, 2022

I recall there’s another thread somewhere this is discussed but can’t find it, so I’ll just tag @potiuk @ashb from memory. Sorry if I misremember the people in that thread.

@potiuk
Copy link
Member

potiuk commented Feb 6, 2022

I recall there’s another thread somewhere this is discussed but can’t find it, so I’ll just tag @potiuk @ashb from memory. Sorry if I misremember the people in that thread.

Yep you do remember well.

@potiuk
Copy link
Member

potiuk commented Feb 6, 2022

Just an error that we already fixed. Merging

@ashb ashb added this to the Airflow 2.2.4 milestone Feb 6, 2022
@potiuk potiuk merged commit 0bcca55 into apache:main Feb 6, 2022
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Feb 8, 2022
jedcunningham pushed a commit that referenced this pull request Feb 8, 2022
When Catchup_by_default is set to false and start_date in the DAG is the
previous day, the first schedule time for this DAG may be incorrect

Co-authored-by: wanlce <who@foxmail.com>
(cherry picked from commit 0bcca55)
jedcunningham pushed a commit that referenced this pull request Feb 10, 2022
When Catchup_by_default is set to false and start_date in the DAG is the
previous day, the first schedule time for this DAG may be incorrect

Co-authored-by: wanlce <who@foxmail.com>
(cherry picked from commit 0bcca55)
@wanlce wanlce deleted the fix-first-dagrun-time branch February 14, 2022 08:20
jedcunningham pushed a commit that referenced this pull request Feb 17, 2022
When Catchup_by_default is set to false and start_date in the DAG is the
previous day, the first schedule time for this DAG may be incorrect

Co-authored-by: wanlce <who@foxmail.com>
(cherry picked from commit 0bcca55)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants