Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note against use of top level code in timetable #26649

Merged
merged 4 commits into from Sep 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 5 additions & 1 deletion airflow/serialization/serialized_objects.py
Expand Up @@ -165,7 +165,11 @@ def __init__(self, type_string: str) -> None:
self.type_string = type_string

def __str__(self) -> str:
return f"Timetable class {self.type_string!r} is not registered"
return (
f"Timetable class {self.type_string!r} is not registered or "
"you have a top level database access that disrupted the session. "
"Please check the airflow best practices documentation."
)
Comment on lines +168 to +172
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of feel that mentioning top level database access is a bit too specific. Maybe it’d be better to be slightly more vague here (there’s something wrong with eager variable evaluation in the class) instead. But I’m not strong on this—we can always reword this afterwards.



def _encode_timetable(var: Timetable) -> dict[str, Any]:
Expand Down
35 changes: 35 additions & 0 deletions docs/apache-airflow/best-practices.rst
Expand Up @@ -216,6 +216,41 @@ or if you need to deserialize a json object from the variable :
For security purpose, you're recommended to use the :ref:`Secrets Backend<secrets_backend_configuration>`
for any variable that contains sensitive data.

.. _best_practices/timetables:

Timetables
----------
Avoid using Airflow Variables/Connections or accessing airflow database at the top level of your timetable code.
Database access should be delayed until the execution time of the DAG. This means that you should not have variables/connections retrieval
as argument to your timetable class initialization or have Variable/connection at the top level of your custom timetable module.

Bad example:

.. code-block:: python

from airflow.models.variable import Variable
from airflow.timetables.interval import CronDataIntervalTimetable


class CustomTimetable(CronDataIntervalTimetable):
def __init__(self, *args, something=Variable.get('something'), **kwargs):
self._something = something
super().__init__(*args, **kwargs)

Good example:

.. code-block:: python

from airflow.models.variable import Variable
from airflow.timetables.interval import CronDataIntervalTimetable


class CustomTimetable(CronDataIntervalTimetable):
def __init__(self, *args, something='something', **kwargs):
self._something = Variable.get(something)
super().__init__(*args, **kwargs)


Triggering DAGs after changes
-----------------------------

Expand Down
6 changes: 6 additions & 0 deletions docs/apache-airflow/concepts/timetable.rst
Expand Up @@ -51,6 +51,12 @@ As such, Airflow allows for custom timetables to be written in plugins and used
DAGs. An example demonstrating a custom timetable can be found in the
:doc:`/howto/timetable` how-to guide.

.. note::

As a general rule, always access Variables, Connections etc or anything that would access
the database as late as possible in your code. See :ref:`best_practices/timetables`
for more best practices to follow.

Built-in Timetables
-------------------

Expand Down
8 changes: 6 additions & 2 deletions tests/serialization/test_dag_serialization.py
Expand Up @@ -406,7 +406,9 @@ def test_dag_serialization_unregistered_custom_timetable(self):
message = (
"Failed to serialize DAG 'simple_dag': Timetable class "
"'tests.test_utils.timetables.CustomSerializationTimetable' "
"is not registered"
"is not registered or "
"you have a top level database access that disrupted the session. "
"Please check the airflow best practices documentation."
)
assert str(ctx.value) == message

Expand Down Expand Up @@ -712,7 +714,9 @@ def test_deserialization_timetable_unregistered(self):
message = (
"Timetable class "
"'tests.test_utils.timetables.CustomSerializationTimetable' "
"is not registered"
"is not registered or "
"you have a top level database access that disrupted the session. "
"Please check the airflow best practices documentation."
)
assert str(ctx.value) == message

Expand Down