You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NodeRestartWithResharding and SoftRebootNodeMonkey nemeses fail on Docker backend, when waiting for a node to return back after reboot/restart disruption.
The error during NodeRestartWithResharding nemesis:
2024-04-08 18:39:07.166: (DisruptionEvent Severity.ERROR) period_type=end event_id=71d16ff5-bb7e-44de-9259-6953d87cac2c duration=2m24s: nemesis_name=RestartWithResharding target_node=Node longevity-1gb-1h-nemesis-longevit-db-node-407ce057-1 [172.17.0.3 | 172.17.0.3] (seed: True) errors=Resharding has not been started (murmur3_partitioner_ignore_msb_bits=15) Check the log for the details
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5117, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_events/group_common_events.py", line 324, in inner_func
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 1003, in disrupt_restart_with_resharding
self.target_node.restart_node_with_resharding(
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2478, in restart_node_with_resharding
raise Exception(f'Resharding has not been started '
Exception: Resharding has not been started (murmur3_partitioner_ignore_msb_bits=15) Check the log for the details
The error during SoftRebootNodeMonkey nemesis:
2024-04-09 01:35:04.396: (DisruptionEvent Severity.ERROR) period_type=end event_id=04e343e9-e83e-4a05-8f96-612bc9f84325 duration=45m10s: nemesis_name=SoftRebootNode target_node=Node longevity-1gb-1h-nemesis-longevit-db-node-a5907dd2-0 [172.17.0.2 | 172.17.0.2] (seed: True) errors=Wait for: uptime_changed: timeout - 2700 seconds - expired
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/wait.py", line 70, in wait_for
res = retry(func, **kwargs)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 404, in __call__
do = self.iter(retry_state=retry_state)
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 360, in iter
raise retry_exc.reraise()
File "/usr/local/lib/python3.10/site-packages/tenacity/__init__.py", line 194, in reraise
raise self
tenacity.RetryError: RetryError[<Future at 0x7f8cdc280940 state=finished returned bool>]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5117, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 927, in disrupt_soft_reboot_node
self.reboot_node(target_node=self.target_node, hard=False)
File "/home/ubuntu/scylla-cluster-tests/sdcm/sct_events/group_common_events.py", line 324, in inner_func
return func(*args, **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3740, in reboot_node
target_node.reboot(hard=hard, verify_ssh=verify_ssh)
File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 1011, in reboot
wait.wait_for(func=uptime_changed, step=10, timeout=60*45, throw_exc=True)
File "/home/ubuntu/scylla-cluster-tests/sdcm/wait.py", line 86, in wait_for
raise raising_exc from ex
sdcm.exceptions.WaitForTimeoutError: Wait for: uptime_changed: timeout - 2700 seconds - expired
I don't think this nemesis is possible with --smp 1, possibly for this one we need to increase it (or change the way we do resharding - change smp instead of changing murmur3_partitioner_ignore_msb_bits).
now I see the error sdcm.exceptions.WaitForTimeoutError: Wait for: uptime_changed: timeout - 2700 seconds - expired - in docker backend command uptime shows host uptime - so we need to reimplement this method for docker backed to take from e.g. docker ps status value
NodeRestartWithResharding and SoftRebootNodeMonkey nemeses fail on Docker backend, when waiting for a node to return back after reboot/restart disruption.
The error during NodeRestartWithResharding nemesis:
The error during SoftRebootNodeMonkey nemesis:
Installation details
SCT Version: master
Scylla version: 2024.1.2-0.20240228.2c85a811d0be
Test:
longevity-5gb-1h-nemesis
Test config: configurations/nemesis/additional_configs/docker_backend_local.yaml
Logs
SoftRebootNodeMonkey Jenkins job url
NodeRestartWithResharding Jenkins job url
The text was updated successfully, but these errors were encountered: