Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable attrs state management on MappedOperator #24772

Merged

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Jul 1, 2022

The custom __getstate__ and __setstate__ implementation from attrs interacts badly with Airflow's DAG serialization and pickling. When a mapped task is deserialized, subclasses are coerced to MappedOperator. But when the instances go through DAG pickling, all attributes defined in the subclasses are dropped by attrs's custom state management.

Since attrs does not do anything too special here (the logic is only important for slots=True), we can use Python's built-in implementation instead.

I locally ran airflow scheduler and this seems to fix the exceptions, but it seems difficult to create a meaningful test for this due to all the components involved. Ideas on test setup are much welcomed.

Fix #23838.

The custom __getstate__ and __setstate__ implementation from attrs
interacts badly with Airflow's DAG serialization and pickling. When a
mapped task is deserialized, subclasses are coerced to MappedOperator.
But when the instances go through DAG pickling, all attributes defined
in the subclasses are dropped by attrs's custom state management.

Since attrs does not do anything too special here (the logic is only
important for slot=True), we can use Python's built-in implementation
instead.
@uranusjr uranusjr force-pushed the fix-mappedoperator-serialized-pickling branch from a0168ef to 3af6579 Compare July 1, 2022 04:38
@ashb
Copy link
Member

ashb commented Jul 1, 2022

Add something to one of the round trip tests in tests/serialization/test_dag_serialization.py? We've already got test_mapped_operator_serde so maybe we just need to add to that test

@ashb
Copy link
Member

ashb commented Jul 5, 2022

Though I am surprised that we hit pickle anywhere in the AIP-45 change, since (from a cursor look at the code) it uses the existing serialized tables, get_dag_by_deserialization etc.

@uranusjr
Copy link
Member Author

uranusjr commented Jul 5, 2022

It might not be pickling per-se (I still have not found the actual direct cause to the issue), but the same marshalling logic (that uses __setstate__) is reused in a lot of places in Python, including the default __copy__ implementation, so that could be the case.

@uranusjr uranusjr merged commit 6fd06fa into apache:main Jul 5, 2022
@uranusjr uranusjr deleted the fix-mappedoperator-serialized-pickling branch July 5, 2022 09:36
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Aug 12, 2022
@ephraimbuddy ephraimbuddy added this to the Airflow 2.3.4 milestone Aug 19, 2022
ephraimbuddy pushed a commit that referenced this pull request Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AIP-45 breaks follow-on mini scheduler for mapped tasks
3 participants