Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with pushing Xcom when using SparkKubernetesOperator #39184

Open
1 of 2 tasks
truonglac2603 opened this issue Apr 23, 2024 · 2 comments
Open
1 of 2 tasks

Problem with pushing Xcom when using SparkKubernetesOperator #39184

truonglac2603 opened this issue Apr 23, 2024 · 2 comments
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@truonglac2603
Copy link

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.3

What happened?

I'm having an issue when using SparkKubernetesOperator to execute Spark job. Whenever i set do_xcom_push to True, the driver pod is created and run perfectly, but the xcom sidecar pod is nowhere to be found. Airflow log is thus stuck with this message:

[2024-04-23, 02:47:38 UTC] {custom_object_launcher.py:301} WARNING - Spark job submitted but not yet started. job_id: spark-custome-task-5r3pt2vk
[2024-04-23, 02:47:48 UTC] {pod_manager.py:529} ERROR - container base whose logs were requested not found in the pod spark-custome-task-5r3pt2vk-driver
[2024-04-23, 02:47:48 UTC] {pod_manager.py:718} INFO - Checking if xcom sidecar container is started.
[2024-04-23, 02:47:48 UTC] {pod_manager.py:724} WARNING - The xcom sidecar container is not yet started.
[2024-04-23, 02:51:24 UTC] {local_task_job_runner.py:296} WARNING - DagRun timed out after 0:05:03.242442.
[2024-04-23, 02:51:29 UTC] {local_task_job_runner.py:296} WARNING - DagRun timed out after 0:05:08.319418.
[2024-04-23, 02:51:29 UTC] {local_task_job_runner.py:302} WARNING - State of this instance has been externally set to skipped. Terminating instance.
[2024-04-23, 02:51:29 UTC] {process_utils.py:131} INFO - Sending 15 to group 30. PIDs of all processes in the group: [30]
[2024-04-23, 02:51:29 UTC] {process_utils.py:86} INFO - Sending the signal 15 to group 30
[2024-04-23, 02:51:29 UTC] {taskinstance.py:2483} ERROR - Received SIGTERM. Terminating subprocesses.
[2024-04-23, 02:51:29 UTC] {process_utils.py:79} INFO - Process psutil.Process(pid=30, status='terminated', exitcode=0, started='02:46:43') (30) terminated with exit code 0

In my opinion, there should be something wrong with xcom push for this operator. Any help would be much appreciated. Thanks in advance.

What you think should happen instead?

xcom pushed perfectly with any case of spark, maybe there will be placeholder for spark job with no return value or whatsoever

How to reproduce

submit a basic spark application by using SparkKubernetesOperator to K8s cluster and set do_xcom_push to True

Operating System

Ubuntu 22.04.4 LTS

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==8.0.1

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@truonglac2603 truonglac2603 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Apr 23, 2024
Copy link

boring-cyborg bot commented Apr 23, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@truonglac2603
Copy link
Author

Here's logs when i run a DBT project with KubernetesPodOperator.

[2024-04-24, 08:53:35 UTC] {pod_manager.py:718} INFO - Checking if xcom sidecar container is started.
[2024-04-24, 08:53:35 UTC] {pod_manager.py:721} INFO - The xcom sidecar container is started.
[2024-04-24, 08:53:35 UTC] {pod_manager.py:798} INFO - Running command... if [ -s //xcom/return.json ]; then cat //xcom/return.json; else echo _***xcom_result_empty; fi
[2024-04-24, 08:53:35 UTC] {pod_manager.py:798} INFO - Running command... kill -s SIGINT 1
[2024-04-24, 08:53:36 UTC] {pod.py:559} INFO - xcom result file is empty.

Seems like it handle empty Xcom like a charm. Should SparkKubernetesOperator work the same as this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

1 participant