New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/consistent get dag rest endpoint #16842
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
update get_dag tests to handle the new behavior update dag_schema to get tags the same way as dag_details_schema
c0f7e78
to
fcae7a2
Compare
I am not sure about this change. Web UI displays DAGs that are deleted. This allows us to access the archival DAG Runes. If you want to fetch a list of DAGs that have not been deleted, you should set the filter |
if dag is None: | ||
raise NotFound("DAG not found", detail=f"The DAG with dag_id: {dag_id} was not found") | ||
|
||
return dag_schema.dump(dag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This serializer accepts DagModel, so we should update typing on line 40 to reflect it. See:
model = DagModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken the get_dag
method indeed returns a DAG object, so instead of updating the typing, should I query it from the database after validating its existence on the DagBag?
"is_active": True, | ||
"fileloc": __file__, | ||
"file_token": FILE_TOKEN, | ||
"is_paused": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_active is a filter parameter, so we should return it in response also for readability and not to expose ourselves to accidental leakage of information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did remove those after changing the endpoint to return a DAG instead of a DagModel given that the API documentation says that the is_active
field is nullable.
To fix this, should I transform from DAG to DagModel and leave the serializer to handle it, or take the approach from the DAGDetailSchema that has the following code:
is_paused = fields.Method("get_is_paused", dump_only=True)
is_active = fields.Method("get_is_active", dump_only=True)
@staticmethod
def get_is_paused(obj: DAG):
"""Checks entry in DAG table to see if this DAG is paused"""
return obj.get_is_paused()
@staticmethod
def get_is_active(obj: DAG):
"""Checks entry in DAG table to see if this DAG is active"""
return obj.get_is_active()
"is_active": False, | ||
"fileloc": __file__, | ||
"file_token": FILE_TOKEN, | ||
"is_paused": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_active is missing here.
I just tested it on Airflow 2.1.1 and if I remove the file for a DAG with existing runs the SerializedDag table is cleaned up and because of this the DAG no longer shows up on the UI. |
Only some views use SerializedDAG because reading the data from this table is expensive. Apart from the cross-DAG dependency view, we never use the list operation on this table. Most views still use the DagModel table including Lines 593 to 599 in 2b7c596
|
We can think of a clearer error message here, but from the home screen, you should also be able to access the DAG Runs |
I have the impression that this change caused this change, but the Airflow 1.10 had a different behavior in this situation. We can change the API, but this will be a breaking change and it requires at least entries in the API documentation (section: Summary of Changes) and in UPDATING.MD. file |
What about those recent changes on the changelog?
|
Adding only_active parameter to /dags endpoint #14306 restored the ability to view the list of all DAGS, both active and inactive. If I understand correctly, now I don't understand why you want to make this change? What do you mean by "stale DAGs"? @kaxil @ephraimbuddy WDYT? |
Got it, I guess it was a miscommunication issue then, I asked on Slack about this when I noticed that Slack Thread: https://apache-airflow.slack.com/archives/CCPRP7943/p1625504228302500
I was thinking stale DAGs were inactive DAGs that no longer have the file on the filesystem at the time. Now after reading the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the Schema documentation, this endpoint is fine as it is, See
airflow/airflow/api_connexion/openapi/v1.yaml
Lines 431 to 436 in c3dd89c
get: | |
summary: Get basic information about a DAG | |
description: > | |
Presents only information available in database (DAGModel). | |
If you need detailed information, consider using GET /dags/{dag_id}/details. |
I don't think we should be making this change @mik-laj , @uranusjr
Good |
For me, I think we can close this PR and issue now. 🤔 I'm still looking forward to giving my first contribution to Airflow, and I will take the time to watch the current Airflow Summit presentations on the subject. 😄 Thanks a lot for taking the time to review this carefully, @mik-laj @ephraimbuddy. |
closes: #16839
I got a bit confused while updating the tests to the new behavior. Previously the tests were using the method
_create_dag_models
to generate test DAGs and when I changed to use the same DAGs from the tests onTestGetDagDetails
the tests started failing because of tags for instance and then I noticed that tags are defined differently ondag_schema
anddag_detail_schema
and I felt like I needed to updatedag_schema
to use the same configuration.There is also the behavior that using DAGs from DagBag instead of directly inserting a DagModel on the database through the session the attributes
is_paused
andis_active
come up as None instead of False/True.Another thing that I noticed is that on TestGetDag there is a test related to the details endpoint here. Should we update this as well?
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.