Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting the execution of a flow via API produces large temporary files that are not automatically deleted! #3335

Open
M-E-Rademaker opened this issue Mar 20, 2024 · 8 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@M-E-Rademaker
Copy link

Describe the issue

The issue is decribed in this thread on slack.

If you need additional information please let me know!

Environment

  • Kestra Version: 0.15.8
  • Operating System (OS/Docker/Kubernetes): Windows 10
  • Java Version (if you don't run kestra in Docker): JDK 17
@M-E-Rademaker M-E-Rademaker added the bug Something isn't working label Mar 20, 2024
@tchiotludo tchiotludo added this to the v0.16.0 milestone Mar 20, 2024
@fhussonnois fhussonnois self-assigned this Apr 2, 2024
@loicmathieu
Copy link
Member

This only happens in Windows and seems to be an issue with MicronautHttpData that cannot delete the temporary file used for file upload.

When a file is uploaded in a multipart request (used to trigger a flow with a FILE input), Micronaut will first transfer the file to disk, then call our endpoint, and at the end delete the file after we transfer it to our internal storage.
Apparently, on Windows, the last step fail.

Interesting part is here:
grafik

@loicmathieu
Copy link
Member

I created an issue upstream: micronaut-projects/micronaut-core#10678

@loicmathieu loicmathieu modified the milestones: v0.16.0, v0.17.0 Apr 10, 2024
@M-E-Rademaker
Copy link
Author

M-E-Rademaker commented May 2, 2024

@loicmathieu looks like micronaut is not going to do much about it?!

As I workaround (or maybe even standard way). Can i pass the input information in the API call to api/v1/executions/trigger/staging/{flow_id} via some other argument than files? In some sense this would make the call even a bit cleaner as I am, in fact, not passing a file but a simple key:value pair (in my case: {"reporting_date" : reporting_date}) . I think the data or json arguement would be more appropriate dont you think?

@loicmathieu
Copy link
Member

@M-E-Rademaker you can pass all kind of inputs, file being one of those.
For simple key/value pair, a STRING input is usually enough, for complex structure we support JSON (it will be handle as a Map). File is to pass a file to be stored in KEstra's internal storage.

See https://kestra.io/docs/workflow-components/inputs

@M-E-Rademaker
Copy link
Author

M-E-Rademaker commented May 2, 2024

i think there was a misunderstanding. I wasnt referring to the input type in a YAML but to the way information is send via API to the API endpoint. In the documentation for the end api/v1/executions/trigger/{namespace}/{flow_id} it says:
grafik
So it only accepts the "files" argument as a way to pass information to a flow that is triggered via post on that endpoint.

However, this is the argument that is causing the issue. I tried using the "data" arguement or the "json" argument (arguments of pythons request.post() method). However, these fail because the endpoint does not seem to be able to do anything with that info.

Example from https://requests.readthedocs.io/en/latest/user/quickstart/ that uses data
grafik


UPDATE
or is what you meant, that "files" is necessary in order to keep the flexibility of also passing a file to input. Ah that would make sense. So you keep that flexiblity at the expense of a slight misuse of the argument if someone is simply passing a key:value pair (in which case, data would be the more appropriate argument)

@loicmathieu
Copy link
Member

This is a doc issue, it accepts multiple form-data parts, each part should match the name of the input except when the input is a file the part must be called files and include the filename via the filename attribute.

For example, this curl will trigger a flow with 2 inputs:

  • file1 which is an input of type FILE
  • string1 which is an input of type STRING
curl -XPOST -F "files=@1563.txt;filename=file1" -F "string1=value1" http://localhost:8888/api/v1/executions/myteam/inputs

@loicmathieu
Copy link
Member

@M-E-Rademaker
Copy link
Author

ok i tried but no matter how i do it. I have to use the files argument when using requests. Maybe the issue is the way requests handles this particular argument. Apparently it creates this temp file. Why it cannot be deleted and why it is so large. I dont understand...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants