Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve KubernetesExecutor Observability #39215

Open
2 tasks done
dengpenn opened this issue Apr 24, 2024 · 3 comments
Open
2 tasks done

Improve KubernetesExecutor Observability #39215

dengpenn opened this issue Apr 24, 2024 · 3 comments
Labels

Comments

@dengpenn
Copy link

Description

During our adoption of Airflow, the scheduler might create hundreds of pods during main scheduling loop. I propose to add two kind of metrics: the response code of k8s client and latency of creating/patching/deleting the pod.

Use case/motivation

Airflow executor create one pod for each individual task. During peak time, we saw 800+ tasks were scheduled and the latency of underlying K8s API increased. The executor's heartbeat might be delayed due to the creation of task pods, potentially affecting the scheduler's heartbeat. It will be good to have metrics to monitor the response code and the latency of k8s API for creating/patching/deleting the pod.

Related issues

N/A

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@dengpenn dengpenn added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Apr 24, 2024
Copy link

boring-cyborg bot commented Apr 24, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@RNHTTR RNHTTR added good first issue provider:cncf-kubernetes Kubernetes provider related issues and removed needs-triage label for new issues that we didn't triage yet labels May 7, 2024
@RNHTTR
Copy link
Collaborator

RNHTTR commented May 7, 2024

What is the worker pods creation batch size? This limits the number of pods created during a given scheduler loop

@dirrao
Copy link
Collaborator

dirrao commented May 11, 2024

We do have two metrics for the same. Can you check?

kubernetes_executor.clear_not_launched_queued_tasks.duration
kubernetes_executor.adopt_task_instances.duration

https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants