Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overnight generative tests are locking up and then timing out #1796

Open
evansd opened this issue Dec 1, 2023 · 2 comments
Open

Overnight generative tests are locking up and then timing out #1796

evansd opened this issue Dec 1, 2023 · 2 comments

Comments

@evansd
Copy link
Contributor

evansd commented Dec 1, 2023

The overnight tests appear to start running OK, complete a small number of batches (5 in one case, 8 in another) and then lock up and sit there for several hours until Github times them out e.g.

image

(Note the difference between the last two timestamps.)

This has happened twice in a row as of the time of writing. Current list of runs is at:
https://github.com/opensafely-core/ehrql/actions/workflows/generative-tests.yml

This comes just after merging this changes:

Which is a bit suspicious, although I can't think what in those changes would cause this kind of behaviour.

@evansd
Copy link
Contributor Author

evansd commented Dec 5, 2023

Since occurring twice in a row this behaviour hasn't reappeared in the subsequent four runs so it may just have been glitchy Github rather than a consequence of any changes we made. I'll leave this ticket open a bit longer and then close if we don't see this happening again.

@evansd
Copy link
Contributor Author

evansd commented Jan 2, 2024

This is still happening quite regularly. It doesn't render the tests useless as they do still run correctly more often than not. But it does mean we're doing less testing than we might otherwise be and also that I (as the last person to touch the scheduled action definition) get spurious email notifications when this happens. (We don't get Slack notifications because there's no compute time left in which to send them.)

It's easy to identify this behaviour in the logs: it's every failed action whose runtime is just a few seconds over exactly 6 hours.
https://github.com/opensafely-core/ehrql/actions/workflows/generative-tests.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant