Hardcoded timeout for call of getParameter.py to get operation mode? #183

fdanapfel · 2023-05-23T09:12:54Z

Hi,

while the timeouts for most calls of HANA binaries and python scripts have been made configurable with commit 7c66a3b , the call to run getParameter.py to get the operation mode still uses a hardcoded timeout of 10 seconds:

https://github.com/SUSE/SAPHanaSR/blob/maintenance-classic/ra/SAPHana#L2664

For other calls of getParameter.py in the resource agents however the $HANA_CALL_TIMEOUT variable is used to use a configurable timeout.

Is there a specific reason why a hardcoded timeout is used for the getParameter.py call to get the operation mode, or would it be possible to also make the timeout configurable by using the $HANA_CALL_TIMEOUT variable instead?

The text was updated successfully, but these errors were encountered:

fmherschel · 2023-05-23T10:02:11Z

The first reason was, that getParamater.py should always answer very fast. Do you have a realistic situation, where getParameter.py did not answer in time? What might be the reason for this? A hanging NFS share? I did not had reviewed that now on code level. My guess is that other then for systemReplicationStatus.py where we have a hard argument to stay with the short timeout, we might change that for the getParameter.py. But we also should take into account that hanging resources could not all be addressed by the SAPHanaSR* resource agents. In special the classic SAPHanaSR resource agents are not independent from the cluster system environment.
Just my first 2ct.

fdanapfel · 2023-05-23T12:03:18Z

I don't have an actual situation where getParameter.py did not answer in time, I was just asked by some colleagues why there is a hardcoded timeout for the call of getParameter.py in this specific case, whereas for other calls of getParameter.py the configurable timeout is used in the resource agents.

fmherschel · 2023-05-23T15:15:44Z

I just have reviewed: https://github.com/SUSE/SAPHanaSR/blob/maintenance-classic/ra/SAPHana#L2664
This is "only" the operation mode if the SR. It needs to be only aquired once* before a register of a former primary is done. So we selected a shorter timeout to prevent to long RA runtimes by adding long timeouts in sequence.
But maybe we should implement the fallback. If getting log-mode is timing-out the function should keep the old value of a query done before.

*) We query the status more than once to get updates, if something would change during the cluster runtime.

fmherschel added the question label May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardcoded timeout for call of getParameter.py to get operation mode? #183

Hardcoded timeout for call of getParameter.py to get operation mode? #183

fdanapfel commented May 23, 2023

fmherschel commented May 23, 2023

fdanapfel commented May 23, 2023

fmherschel commented May 23, 2023 •

edited

Hardcoded timeout for call of getParameter.py to get operation mode? #183

Hardcoded timeout for call of getParameter.py to get operation mode? #183

Comments

fdanapfel commented May 23, 2023

fmherschel commented May 23, 2023

fdanapfel commented May 23, 2023

fmherschel commented May 23, 2023 • edited

fmherschel commented May 23, 2023 •

edited