Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused index idx_last_scheduling_decision on dag_run table #39275

Merged

Conversation

pankajkoti
Copy link
Member

We added idx_last_scheduling_decision on the
last_scheduling_decision column in the dag_run table
with Airflow 2.0.0. However, this index seems to have been
unused with 0 index_scan counts throughout. I verified
scheduler performance metrics are not affected after removing
this index.

Additionally, this index seems to occupy huge storage space,
almost half the size of the table (e.g. on one deployment that
I checked, the table occupies 9.8GB of records, but this single
index alone occupies 5.7 GB)


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@pankajkoti pankajkoti force-pushed the remove-idx-last-scheduling-decision branch 2 times, most recently from 6ae0e52 to d86e3f5 Compare April 26, 2024 11:59
Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single index alone occupies 5.7 GB

That a bit strange that this index could use such amount of storage. Maybe this happen due to disabled auto vacuum daemon or never.

In general I agree that this index unlikely will ever use in Postgres. There is couple places where last_scheduling_decision use in sorting or filtration, but every time it use alongside with other fields so this much less effective rather than even make seq scan or use other indexes for optimise particular queries.

So removal of this one could make more benefits rather than keep it

@pankajkoti pankajkoti added this to the Airflow 2.9.1 milestone Apr 26, 2024
@pankajkoti
Copy link
Member Author

pankajkoti commented Apr 30, 2024

@vincbeck @Taragolis @jedcunningham @utkarsharma2 addressed the review comments so far. Requesting re-review please.

@pankajkoti pankajkoti force-pushed the remove-idx-last-scheduling-decision branch from 4928d9d to e3bca9e Compare April 30, 2024 13:15
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better! Thanks!

@jedcunningham jedcunningham merged commit 92ffab6 into apache:main Apr 30, 2024
41 checks passed
@jedcunningham jedcunningham deleted the remove-idx-last-scheduling-decision branch April 30, 2024 19:57
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 30, 2024
@pankajkoti
Copy link
Member Author

pankajkoti commented May 1, 2024

Bringing up a hidden in a resolved conversation comment from Jed in a conversation so that it appears here upfront

From Jed
"cc @ephraimbuddy we will want to remove 2.10.0 when we cherry-pick this back to v2-9."

The comment talks about removing this line https://github.com/apache/airflow/pull/39275/files#diff-730eff695b18ad05249293cc3361da7424aba164857e9c5d26e50bd3a78ba6e7R95 when cherry-picking back to v2-9

@ephraimbuddy
Copy link
Contributor

Bringing up a hidden in a resolved conversation comment from Jed in a conversation so that it appears here upfront

From Jed "cc @ephraimbuddy we will want to remove 2.10.0 when we cherry-pick this back to v2-9."

The comment talks about removing this line https://github.com/apache/airflow/pull/39275/files#diff-730eff695b18ad05249293cc3361da7424aba164857e9c5d26e50bd3a78ba6e7R95 when cherry-picking back to v2-9

Pre-commit will still fix it when we cherry pick. Nothing to worry about

RodrigoGanancia pushed a commit to RodrigoGanancia/airflow that referenced this pull request May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db-migrations PRs with DB migration kind:documentation type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants