Expressiontool timer #1681

douglowe · 2022-06-09T10:58:05Z

This PR is in response to issue 1680.

Increased the default time for expression tools to 60 seconds, so that --eval-timeout flag will be used more for optimisation purposes, rather than being needed to accomodate systems with slower response times for expression tools.
Added some log information for situations where the expression tool time limit is triggered.

Outstanding questions:

The exec_js_process routine seems to be used for organisational tasks at the start of the workflow. And is run with a 30 second time limit. If the tool time limit gets breached for that process then my log message will be misleading (because the --eval-timeout flag doesn't control that time limit).
Really it would be useful for the workflow to exit if a tool fails in this manner. Perhaps when this is implemented the log messages could be moved to a more appropriate location in the code (solving the issue I note above)?

Increasing eval_timeout from 20 to 60 seconds - to allow more time for HPC's which use distributed filesystems.

This matches the new default value in self.eval_timeout (context.py). The longer default time allows for smooth running on systems with increased I/O timings (such as HPC systems with network based file systems).

Added logger output when timer has been triggered. This also stops the truncation of stdout and stderr.

douglowe · 2022-06-09T11:13:41Z

3rd outstanding question (while I remember).

I've moved timer.cancel() and my timer.is_alive() check to after stdin_buf.close(), because I don't know if there's any risk in reading stdout_buf and stderr_buf before closing this. If closing stdin_buf takes any significant time to perform, then there's a risk of a false triggering of my timer check. For a longterm fix it might be better to reorganise my code here.

codecov · 2022-06-09T11:34:34Z

Codecov Report

Merging #1681 (807323b) into main (b21b0c1) will decrease coverage by 0.06%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##             main    #1681      +/-   ##
==========================================
- Coverage   66.85%   66.79%   -0.07%     
==========================================
  Files          93       93              
  Lines       16648    16660      +12     
  Branches     4420     4422       +2     
==========================================
- Hits        11130    11128       -2     
- Misses       4376     4388      +12     
- Partials     1142     1144       +2

Impacted Files	Coverage Δ
cwltool/argparser.py	`94.82% <ø> (ø)`
cwltool/process.py	`88.94% <ø> (ø)`
cwltool/validate_js.py	`84.29% <ø> (ø)`
cwltool/sandboxjs.py	`72.63% <37.50%> (-1.97%)`	⬇️
cwltool/context.py	`97.59% <100.00%> (+0.01%)`	⬆️
cwltool/cwltool/sandboxjs.py	`54.73% <0.00%> (-1.48%)`	⬇️
cwltool/cwltool/job.py	`61.85% <0.00%> (-0.80%)`	⬇️
cwltool/cwltool/process.py	`70.96% <0.00%> (ø)`
cwltool/cwltool/validate_js.py	`82.64% <0.00%> (ø)`
cwltool/cwltool/context.py	`88.55% <0.00%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b21b0c1...807323b. Read the comment docs.

mr-c · 2022-06-09T12:08:46Z

1. The exec_js_process routine seems to be used for organisational tasks at the start of the workflow. And is run with a 30 second time limit. If the tool time limit gets breached for that process then my log message will be misleading (because the --eval-timeout flag doesn't control that time limit).

Good catch. We can refactor the code to respect --eval-timeout here. Do you want help with that?

2. Really it would be useful for the workflow to exit if a tool fails in this manner. Perhaps when this is implemented the log messages could be moved to a more appropriate location in the code (solving the issue I note above)?

Sure, you could raise a JavascriptException when the timeout is triggered. That should cause a reasonable failure. We'll add tests for the different scenarios to confirm that.

3rd outstanding question (while I remember).

I've moved timer.cancel() and my timer.is_alive() check to after stdin_buf.close(), because I don't know if there's any risk in reading stdout_buf and stderr_buf before closing this. If closing stdin_buf takes any significant time to perform, then there's a risk of a false triggering of my timer check. For a longterm fix it might be better to reorganise my code here.

Good to know. I think it will be okay for now. What do you think @tetron ?

cwltool/sandboxjs.py

douglowe · 2022-06-10T17:33:06Z

The exec_js_process routine seems to be used for organisational tasks at the start of the workflow. And is run with a 30 second time limit. If the tool time limit gets breached for that process then my log message will be misleading (because the --eval-timeout flag doesn't control that time limit).

Good catch. We can refactor the code to respect --eval-timeout here. Do you want help with that?

Yep - I'm happy to help with that.

Really it would be useful for the workflow to exit if a tool fails in this manner. Perhaps when this is implemented the log messages could be moved to a more appropriate location in the code (solving the issue I note above)?

Sure, you could raise a JavascriptException when the timeout is triggered. That should cause a reasonable failure. We'll add tests for the different scenarios to confirm that.

I'll have a look at adding this in.

For consistency with other expression tools, eval_timeout is now used for controlling the timeout of the validate_js_expression steps. This has involved adding eval_timeout to LoadingContext (mirroring RuntimeContext).

douglowe · 2022-06-23T16:10:17Z

I've refactored the code to use --eval-timeout to control exec_js_process within the validate_js_expressions step. This has involved duplicating eval_timeout to be stored in LoadingContext, mirroring RuntimeContext.

* increase default eval_timeout value Increasing eval_timeout from 20 to 60 seconds. The longer default time allows for smooth running on systems with increased I/O timings (such as HPC systems with network based file systems). * Add log information when eval-timeout is exceeded Added logger output when timer has been triggered. This also stops the truncation of stdout and stderr. * Add eval_timeout control to validate_js_expressions For consistency with other expression tools, eval_timeout is now used for controlling the timeout of the validate_js_expression steps. This has involved adding eval_timeout to LoadingContext (mirroring RuntimeContext).

douglowe added 3 commits June 9, 2022 11:02

increase default eval_timeout value

cb18326

Increasing eval_timeout from 20 to 60 seconds - to allow more time for HPC's which use distributed filesystems.

Increase eval-timeout default to 60 seconds

158b930

This matches the new default value in self.eval_timeout (context.py). The longer default time allows for smooth running on systems with increased I/O timings (such as HPC systems with network based file systems).

Add log information when eval-timeout is exceeded

73f03c3

Added logger output when timer has been triggered. This also stops the truncation of stdout and stderr.

mr-c reviewed Jun 9, 2022

View reviewed changes

cwltool/sandboxjs.py Outdated Show resolved Hide resolved

cleanup using make format

cfd88b2

Add eval_timeout control to validate_js_expressions

8913810

For consistency with other expression tools, eval_timeout is now used for controlling the timeout of the validate_js_expression steps. This has involved adding eval_timeout to LoadingContext (mirroring RuntimeContext).

douglowe and others added 2 commits June 23, 2022 17:12

Merge branch 'main' into expressiontool_timer

d9dc958

make format; add missing defaults

807323b

mr-c approved these changes Jun 23, 2022

View reviewed changes

mr-c enabled auto-merge (squash) June 23, 2022 17:15

mr-c merged commit fb5774b into common-workflow-language:main Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expressiontool timer #1681

Expressiontool timer #1681

douglowe commented Jun 9, 2022

douglowe commented Jun 9, 2022

codecov bot commented Jun 9, 2022 •

edited

mr-c commented Jun 9, 2022

douglowe commented Jun 10, 2022

douglowe commented Jun 23, 2022

Expressiontool timer #1681

Expressiontool timer #1681

Conversation

douglowe commented Jun 9, 2022

douglowe commented Jun 9, 2022

codecov bot commented Jun 9, 2022 • edited

Codecov Report

mr-c commented Jun 9, 2022

douglowe commented Jun 10, 2022

douglowe commented Jun 23, 2022

codecov bot commented Jun 9, 2022 •

edited