Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro-airflow: Dataset Factories doesn't work on deployed project #660

Closed
DimedS opened this issue Apr 25, 2024 · 1 comment
Closed

kedro-airflow: Dataset Factories doesn't work on deployed project #660

DimedS opened this issue Apr 25, 2024 · 1 comment

Comments

@DimedS
Copy link
Contributor

DimedS commented Apr 25, 2024

Description

When deploying a Kedro project to Astro Airflow, as outlined in this manual: Kedro Deployment to Airflow, the DAG fails on the first task when run through the Airflow UI. The failure is due to issues with Dataset Factorie (log at the end of issue).

When you replace Step 3 of kedro project preparation from:

{base_dataset}:
  type: pandas.CSVDataset
  filepath: data/02_intermediate/{base_dataset}.csv

to

X_train:
  type: pandas.CSVDataset
  filepath: data/02_intermediate/X_train.csv

X_test:
  type: pandas.CSVDataset
  filepath: data/02_intermediate/X_test.csv

y_train:
  type: pandas.CSVDataset
  filepath: data/02_intermediate/y_train.csv

y_test:
  type: pandas.CSVDataset
  filepath: data/02_intermediate/y_test.csv

This change, which eliminates the use of Dataset Factories and specifies each Memory Dataset individually, allows the project to run successfully.

I don't know exactly what's wrong with Dataset Factories.

Steps to Reproduce

Go with manual:
https://docs.kedro.org/en/stable/deployment/airflow.html

Expected Result

It should work well when you run the DAG.

Actual Result

9f42d3a57969
*** Found local files:
***   * /usr/local/airflow/logs/dag_id=new-kedro-project/run_id=manual__2024-04-25T12:56:00.438254+00:00/task_id=preprocess-companies-node/attempt=2.log
[2024-04-25, 13:04:02 UTC] {local_task_job_runner.py:120} ▶ Pre task execution logs
[2024-04-25, 13:04:03 UTC] {session.py:324} INFO - Kedro project airflow
[2024-04-25, 13:04:03 UTC] {taskinstance.py:441} ▼ Post task execution logs
[2024-04-25, 13:04:03 UTC] {taskinstance.py:2890} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 400, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dags/new_kedro_project_dag.py", line 35, in execute
    session.run(self.pipeline_name, node_names=[self.node_name])
  File "/home/astro/.local/lib/python3.11/site-packages/kedro/framework/session/session.py", line 377, in run
    catalog = context._get_catalog(
              ^^^^^^^^^^^^^^^^^^^^^
  File "/home/astro/.local/lib/python3.11/site-packages/kedro/framework/context/context.py", line 223, in _get_catalog
    conf_catalog = self.config_loader["catalog"]
                   ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/astro/.local/lib/python3.11/site-packages/kedro/config/omegaconf_config.py", line 221, in __getitem__
    env_config = self.load_and_merge_dir_config(  # type: ignore[no-untyped-call]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/astro/.local/lib/python3.11/site-packages/kedro/config/omegaconf_config.py", line 311, in load_and_merge_dir_config
    config = OmegaConf.load(tmp_fo)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/astro/.local/lib/python3.11/site-packages/omegaconf/omegaconf.py", line 192, in load
    obj = yaml.load(file_, Loader=get_yaml_loader())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yaml/constructor.py", line 60, in construct_document
    for dummy in generator:
  File "/usr/local/lib/python3.11/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/astro/.local/lib/python3.11/site-packages/omegaconf/_utils.py", line 151, in construct_mapping
    return super().construct_mapping(node, deep=deep)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yaml/constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/yaml/constructor.py", line 141, in construct_mapping
    raise ConstructorError("while constructing a mapping", node.start_mark,
yaml.constructor.ConstructorError: while constructing a mapping
  in "<file>", line 42, column 1
found unhashable key
  in "<file>", line 73, column 1
[2024-04-25, 13:04:03 UTC] {taskinstance.py:1205} INFO - Marking task as FAILED. dag_id=new-kedro-project, task_id=preprocess-companies-node, execution_date=20240425T125600, start_date=20240425T130402, end_date=20240425T130403
[2024-04-25, 13:04:03 UTC] {standard_task_runner.py:110} ERROR - Failed to execute job 108 for task preprocess-companies-node (while constructing a mapping
  in "<file>", line 42, column 1
found unhashable key
  in "<file>", line 73, column 1; 190)
[2024-04-25, 13:04:03 UTC] {local_task_job_runner.py:240} INFO - Task exited with return code 1
[2024-04-25, 13:04:03 UTC] {taskinstance.py:3482} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2024-04-25, 13:04:03 UTC] {local_task_job_runner.py:222} ▲▲▲ Log group end

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.19.5
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): 0.8.0
  • Python version used (python -V):
  • Operating system and version:
@DimedS DimedS added this to the Fix and improve kedro-airflow milestone Apr 25, 2024
@ankatiyar
Copy link
Contributor

I'll close this since this was just a matter of missing quotes around the dataset factory name in the docs, will be fixed by kedro-org/kedro#3860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants