Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sif file corrupted and no DeadContainerException #278

Open
sverhoeven opened this issue Nov 29, 2021 · 3 comments
Open

sif file corrupted and no DeadContainerException #278

sverhoeven opened this issue Nov 29, 2021 · 3 comments

Comments

@sverhoeven
Copy link
Member

We had a corrupt SIF file, to check ran singularity run <bla.sif>, it printed could not open image <bla.sif>: SIF image <bla.sif> is corrupted: wrong partition size.

I expected DeadContainerException to be thrown at https://github.com/eWaterCycle/grpc4bmi/blob/d4e644a3177774e348295f78c2c4061094858256/grpc4bmi/bmi_client_singularity.py#L235, but instead the BmiClientSingularity was stuck on connecting to a grpc server that died prematurely.

Can we make it possible to set the delay and timeout in the grpc4bmi.bmi_client_singularity.BmiClientSingularity construtor?
So we give some time for the container to die instead of immediatly checking
This could be done by passing delay in the model.setup() method or in the ewatercycle.yaml file.

@Peter9192
Copy link
Collaborator

Do I understand correctly that you want to expose these settings through the eWaterCycle python package? Or can we just hard-code it in the model.setup() method?

Also: should we add an issue to catch this in grpc4bmi?

@sverhoeven
Copy link
Member Author

In grcp4bmi you can set the delay and timeout already. The problem is that the ewatercycle package uses defaults which can not be overwritten when you want to debug why a model is not launching.

The time between starting the container and checking if the container is stilll running is now set to 0 seconds. Being able to make this something like 0.1 seconds, would catch more unsuccessful container starts. Having a non-zero delay hardcoded in the model.setup() would help, but it depends on the system speed how long you must wait for the container to die prematurely. So I would rather have a way to change the delay on a system wide area like in ewatercycle.yaml then a hardcoded value or exposing it as a argument in setup().

@Peter9192
Copy link
Collaborator

Ah right, so in that case I'd argue for setting it in ewatercycle.yaml, with a generous default value. Since it is quite a technical thing to do, I'd rather hide it from the public API as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants