New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow 2.2.2 pod_override does not override args
of V1Container
#27358
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
To add on top of the above-mentioned details, I think the issue is perhaps somewhere in these LOCs: https://github.com/apache/airflow/blob/main/airflow/executors/kubernetes_executor.py#L307-L333 But is precisely here:
The above LOC always overrides EDIT: Digging further, I see that airflow/airflow/kubernetes/pod_generator.py Lines 394 to 397 in 5df1d6e
|
Hey @oneturkmen! That is expected behavior, as the worker uses that to know what task to run. May I ask what you are trying to ultimately achieve by overwriting args? This should have been documented though, so I've opened #27450 to do that. Thanks. |
@jedcunningham we wanted to have a BashSensor task that would ping an external service to see if some file is generated or not. If the file isn't there yet, we would keep pinging for some time, and only then if it's still not ready, then we would fail the task.
I did not expect that because we use |
I think you are misunderstanding what KubernetesExecutor is actually doing. KE spins up a Airflow worker pod for every task. In your case, it'll spin up a pod and say "Airflow, run task 'foo_task' for dag 'foo_dag' run_id 'manual__...'" (which matches the args KE sets). That worker then will run your (in this case) bash_command (or do whatever else you've asked it to do). KPO is a different situation. The conceptual "kubectl create pod" is replacing the bash_command, but it still runs from an Airflow worker. Short version: You want to put all your task specific logic in bash_command when doing a BashSensor. Bonus, this keeps it portable between executors! I actually gave a talk that covered this at Airflow Summit this year, it's short so might be worth a watch: https://youtu.be/H8JjhiVGOlg |
Thanks Jed. That makes much more sense now.
…On Wed, Nov 2, 2022, 11:31 Jed Cunningham ***@***.***> wrote:
I think you are misunderstanding what KubernetesExecutor is actually
doing. KE spins up a Airflow worker pod for every task. In your case, it'll
spin up a pod and say "Airflow, run task 'foo_task' for dag 'foo_dag'
run_id 'manual__...'" (which matches the args KE sets). That worker then
will run your (in this case) bash_command (or do whatever else you've asked
it to do).
KPO is a different situation. The conceptual "kubectl create pod" is
replacing the bash_command, but it still runs from an Airflow worker.
Short version: You want to put all your task specific logic in
bash_command when doing a BashSensor. Bonus, this keeps it portable between
executors!
I actually gave a talk that covered this at Airflow Summit this year, it's
short so might be worth a watch: https://youtu.be/H8JjhiVGOlg
—
Reply to this email directly, view it on GitHub
<#27358 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEJDMLAUFQXP4OLIXJA5RXTWGKCLXANCNFSM6AAAAAARRRH2TI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@jedcunningham Hello. I stumbled upon this issue while debugging an error. I watched the video you mentioned above and could not find an answer so I thought of asking here. I hope it's okay. So I have a K8s executor with a custom image. I trigger the dag from Airflow UI by passing in custom parameters using the "trigger DAG w/config" option. I understand these parameters will be accessible to the task via the dag_run dictionary. But I am not able to access the dag_run dictionary. Below is a brief snippet of my task definition.
my_task Any pointers/suggestions on how to access dag_run in a KubernetesExecutor will be very helpful. Thank you! |
Generally you should ask this type of stuff on our slack or in a discussion instead of old issues, even if they are sorta related like this one. Couple things:
|
@jedcunningham Ack on using Slack or discussions for asking questions. Next time will do that. Thanks a lot for the detailed answers. They are helpful and confirm the solution that I stumbled upon just an hour ago via trial and error. So now I am creating my custom image by inheriting from the Airflow Worker image. Then in the executor_config, I am not defining any command. Instead, in my_task, I am using context to get the dag_run parameters and then using Python subprocess to invoke the actual command. This setup is working now. Hopefully, this explanation can help someone else who runs into this issue. Thank you for the quick reply 💯 |
Apache Airflow version
2.2.2
What happened
I have a bash sensor defined as follows:
Entrypoint command in the
foo-image
ispython -m foo.run
. However, when I deploy the image onto Openshift (Kubernetes), the command somehow turns out to be the following:which is wrong.
What you think should happen instead
I assume the expected command should override
args
(see V1Containerargs
value above) and therefore should be:and not:
How to reproduce
To reproduce the above issue, create a simple DAG and a sensor as defined above. Use a sample image and try to override the args. I cannot provide the same code due to NDA.
Operating System
RHLS 7.9
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==2.4.0
apache-airflow-providers-cncf-kubernetes==2.1.0
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-mysql==2.1.1
apache-airflow-providers-sqlite==2.0.1
Deployment
Other
Deployment details
N/A
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: