Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rely on current ORM structure for db clean command #23574

Merged

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented May 8, 2022

For command DB clean, by not relying on the ORM models, we can use db clean from 2.3 pre-upgrade.

Also, adds feature where the rows are always archived instead of simply deleted. (You can skip the archive with --skip-archive.)

Example output with INFO log level:

❯ airflow db clean --clean-before-timestamp 2022-05-31 -y

Checking table job
Found 0 rows meeting deletion criteria.

Checking table log
Found 5 rows meeting deletion criteria.
Moving data to table _airflow_deleted__log__20220523160335

Checking table sensor_instance
Found 0 rows meeting deletion criteria.
...

Example with DEBUG log level:

❯ airflow db clean --clean-before-timestamp 2022-05-31 -y
...
[2022-05-23 09:05:42,182] {db_cleanup.py:237} DEBUG - old rows query:
SELECT base.*
FROM dag_run AS base LEFT OUTER JOIN (SELECT dag_id, max(dag_run.start_date) AS max_date_per_group
FROM dag_run
WHERE external_trigger = false GROUP BY dag_id) AS latest ON base.dag_id = latest.dag_id AND base.start_date = max_date_per_group
WHERE base.start_date < :start_date_1 AND max_date_per_group IS NULL
Checking table dag_run
Found 0 rows meeting deletion criteria.

[2022-05-23 09:05:42,192] {db_cleanup.py:237} DEBUG - old rows query:
SELECT base.*
FROM import_error AS base
WHERE base.timestamp < :timestamp_1
Checking table import_error
Found 0 rows meeting deletion criteria.

[2022-05-23 09:05:42,200] {db_cleanup.py:237} DEBUG - old rows query:
SELECT base.*
FROM log AS base
WHERE base.dttm < :dttm_1
Checking table log
Found 1 rows meeting deletion criteria.
Moving data to table _airflow_deleted__log__20220523160542
[2022-05-23 09:05:42,204] {db_cleanup.py:250} DEBUG - ctas query:
CREATE TABLE _airflow_deleted__log__20220523160542 AS SELECT base.*
FROM log AS base
WHERE base.dttm < :dttm_1
[2022-05-23 09:05:42,260] {db_cleanup.py:258} DEBUG - rows moved; purging from log
[2022-05-23 09:05:42,261] {db_cleanup.py:274} DEBUG - delete statement:
DELETE FROM log USING _airflow_deleted__log__20220523160542 WHERE log.id = _airflow_deleted__log__20220523160542.id

@dstandish dstandish force-pushed the do-not-rely-on-current-orm-for-db-clean branch from 1d2d2b8 to 715de70 Compare May 23, 2022 16:23
@dstandish dstandish changed the title WIP - Don't rely on current ORM structure for db clean command Don't rely on current ORM structure for db clean command May 24, 2022
@dstandish dstandish marked this pull request as ready for review May 24, 2022 12:09
@dstandish dstandish marked this pull request as draft May 26, 2022 14:42
@dstandish dstandish force-pushed the do-not-rely-on-current-orm-for-db-clean branch 2 times, most recently from 4c8d325 to 43fdf83 Compare May 27, 2022 15:48
@dstandish dstandish force-pushed the do-not-rely-on-current-orm-for-db-clean branch from 84c2031 to e1b4c49 Compare June 7, 2022 22:05
@dstandish dstandish marked this pull request as ready for review June 7, 2022 22:38
@dstandish dstandish force-pushed the do-not-rely-on-current-orm-for-db-clean branch from c997019 to 6f4ac72 Compare June 7, 2022 22:38
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This will give the option to not access DB directly for the users in order to remove their error rows.

NIT. I think it's worth mentioning in the warning mesage printed in the UI when there are "erroneous" row, that the user can use the "db cleanup" command to delete them.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 12, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@dstandish
Copy link
Contributor Author

Nice. This will give the option to not access DB directly for the users in order to remove their error rows.
NIT. I think it's worth mentioning in the warning mesage printed in the UI when there are "erroneous" row, that the user can use the "db cleanup" command to delete them.

well it's just about purging old rows not necessarily erroneous rows... can you clarify which warning message?

@dstandish dstandish force-pushed the do-not-rely-on-current-orm-for-db-clean branch 2 times, most recently from 03e33a1 to ee6908a Compare June 15, 2022 04:43
@dstandish dstandish force-pushed the do-not-rely-on-current-orm-for-db-clean branch from ee6908a to 0760e6b Compare June 17, 2022 16:16
@dstandish dstandish merged commit 95bd6b7 into apache:main Jun 17, 2022
@dstandish dstandish deleted the do-not-rely-on-current-orm-for-db-clean branch June 17, 2022 19:46
@potiuk potiuk added this to the Airflow 2.3.4 milestone Jul 16, 2022
@potiuk
Copy link
Member

potiuk commented Jul 16, 2022

This one also fixes error raised in #25063 (bad recency column in "callback_request" table), so it should be cherry-picked to 2.3.4 IMHO.

@ephraimbuddy ephraimbuddy added type:improvement Changelog: Improvements type:bug-fix Changelog: Bug Fixes and removed type:improvement Changelog: Improvements labels Aug 12, 2022
ephraimbuddy pushed a commit that referenced this pull request Aug 15, 2022
For command DB clean, by not relying on the ORM models, we will be able to use the command even when the metadatabase is not yet upgraded to the version of Airflow you have installed.

Additionally we archive all rows before deletion.

(cherry picked from commit 95bd6b7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants