Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressiontool timer #1681

Merged
merged 7 commits into from Jun 23, 2022
Merged

Conversation

douglowe
Copy link
Contributor

@douglowe douglowe commented Jun 9, 2022

This PR is in response to issue 1680.

  1. Increased the default time for expression tools to 60 seconds, so that --eval-timeout flag will be used more for optimisation purposes, rather than being needed to accomodate systems with slower response times for expression tools.
  2. Added some log information for situations where the expression tool time limit is triggered.

Outstanding questions:

  1. The exec_js_process routine seems to be used for organisational tasks at the start of the workflow. And is run with a 30 second time limit. If the tool time limit gets breached for that process then my log message will be misleading (because the --eval-timeout flag doesn't control that time limit).
  2. Really it would be useful for the workflow to exit if a tool fails in this manner. Perhaps when this is implemented the log messages could be moved to a more appropriate location in the code (solving the issue I note above)?

Increasing eval_timeout from 20 to 60 seconds - to allow more time for HPC's which use distributed filesystems.
This matches the new default value in self.eval_timeout
(context.py). The longer default time allows for smooth
running on systems with increased I/O timings (such as
HPC systems with network based file systems).
Added logger output when timer has been triggered.
This also stops the truncation of stdout and stderr.
@douglowe
Copy link
Contributor Author

douglowe commented Jun 9, 2022

3rd outstanding question (while I remember).

I've moved timer.cancel() and my timer.is_alive() check to after stdin_buf.close(), because I don't know if there's any risk in reading stdout_buf and stderr_buf before closing this. If closing stdin_buf takes any significant time to perform, then there's a risk of a false triggering of my timer check. For a longterm fix it might be better to reorganise my code here.

@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #1681 (807323b) into main (b21b0c1) will decrease coverage by 0.06%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main    #1681      +/-   ##
==========================================
- Coverage   66.85%   66.79%   -0.07%     
==========================================
  Files          93       93              
  Lines       16648    16660      +12     
  Branches     4420     4422       +2     
==========================================
- Hits        11130    11128       -2     
- Misses       4376     4388      +12     
- Partials     1142     1144       +2     
Impacted Files Coverage Δ
cwltool/argparser.py 94.82% <ø> (ø)
cwltool/process.py 88.94% <ø> (ø)
cwltool/validate_js.py 84.29% <ø> (ø)
cwltool/sandboxjs.py 72.63% <37.50%> (-1.97%) ⬇️
cwltool/context.py 97.59% <100.00%> (+0.01%) ⬆️
cwltool/cwltool/sandboxjs.py 54.73% <0.00%> (-1.48%) ⬇️
cwltool/cwltool/job.py 61.85% <0.00%> (-0.80%) ⬇️
cwltool/cwltool/process.py 70.96% <0.00%> (ø)
cwltool/cwltool/validate_js.py 82.64% <0.00%> (ø)
cwltool/cwltool/context.py 88.55% <0.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b21b0c1...807323b. Read the comment docs.

@mr-c
Copy link
Member

mr-c commented Jun 9, 2022

1. The exec_js_process routine seems to be used for organisational tasks at the start of the workflow. And is run with a 30 second time limit. If the tool time limit gets breached for that process then my log message will be misleading (because the --eval-timeout flag doesn't control that time limit).

Good catch. We can refactor the code to respect --eval-timeout here. Do you want help with that?

2. Really it would be useful for the workflow to exit if a tool fails in this manner. Perhaps when this is implemented the log messages could be moved to a more appropriate location in the code (solving the issue I note above)?

Sure, you could raise a JavascriptException when the timeout is triggered. That should cause a reasonable failure. We'll add tests for the different scenarios to confirm that.

3rd outstanding question (while I remember).

I've moved timer.cancel() and my timer.is_alive() check to after stdin_buf.close(), because I don't know if there's any risk in reading stdout_buf and stderr_buf before closing this. If closing stdin_buf takes any significant time to perform, then there's a risk of a false triggering of my timer check. For a longterm fix it might be better to reorganise my code here.

Good to know. I think it will be okay for now. What do you think @tetron ?

cwltool/sandboxjs.py Outdated Show resolved Hide resolved
@douglowe
Copy link
Contributor Author

  1. The exec_js_process routine seems to be used for organisational tasks at the start of the workflow. And is run with a 30 second time limit. If the tool time limit gets breached for that process then my log message will be misleading (because the --eval-timeout flag doesn't control that time limit).

Good catch. We can refactor the code to respect --eval-timeout here. Do you want help with that?

Yep - I'm happy to help with that.

  1. Really it would be useful for the workflow to exit if a tool fails in this manner. Perhaps when this is implemented the log messages could be moved to a more appropriate location in the code (solving the issue I note above)?

Sure, you could raise a JavascriptException when the timeout is triggered. That should cause a reasonable failure. We'll add tests for the different scenarios to confirm that.

I'll have a look at adding this in.

For consistency with other expression tools,
eval_timeout is now used for controlling the timeout
of the validate_js_expression steps. This has
involved adding eval_timeout to LoadingContext
(mirroring RuntimeContext).
@douglowe
Copy link
Contributor Author

I've refactored the code to use --eval-timeout to control exec_js_process within the validate_js_expressions step. This has involved duplicating eval_timeout to be stored in LoadingContext, mirroring RuntimeContext.

@mr-c mr-c enabled auto-merge (squash) June 23, 2022 17:15
@mr-c mr-c merged commit fb5774b into common-workflow-language:main Jun 23, 2022
GlassOfWhiskey pushed a commit that referenced this pull request Sep 15, 2022
* increase default eval_timeout value

Increasing eval_timeout from 20 to 60 seconds.

 The longer default time allows for smooth
running on systems with increased I/O timings (such as
HPC systems with network based file systems).

* Add log information when eval-timeout is exceeded

Added logger output when timer has been triggered.
This also stops the truncation of stdout and stderr.

* Add eval_timeout control to validate_js_expressions

For consistency with other expression tools,
eval_timeout is now used for controlling the timeout
of the validate_js_expression steps. This has
involved adding eval_timeout to LoadingContext
(mirroring RuntimeContext).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants