Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Traceback and long sleep when no tools are registered #3057

Open
1 task done
ldoktor opened this issue Nov 1, 2022 · 4 comments
Open
1 task done

[BUG] Traceback and long sleep when no tools are registered #3057

ldoktor opened this issue Nov 1, 2022 · 4 comments
Assignees
Labels
bug fio pbench-fio benchmark related Tool Meister Of and relating to the Tool Meister sub-system tools Of and related to the operation and behavior of various tools (iostat, sar, etc.)
Milestone

Comments

@ldoktor
Copy link
Contributor

ldoktor commented Nov 1, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Description

Recently I wanted to check whether sysinfo tool collection is not affecting my runs and tried running with no tools registered, only the tools-v1-default directory present. On pbench-start-tools it shows a warning pbench-start-tools: WARNING: No tools are registered, then it repeats the warning on stop pbench-stop-tools: WARNING: No tools are registered, send pbench-send-tools: WARNING: No tools are registered and postprocess pbench-postprocess-tools: WARNING: No tools are registered. Then it hangs for about a minute and raises the exception (see below)

To Reproduce

mkdir -p /var/lib/pbench/tools-v1-default
pbench-fio -t read --samples 1 -b 4 -r 30 -s 4M -d /fio

Actual Results

pbench-fio: WARNING: No tools are registered
Created the following job file (/var/lib/pbench-agent/fio__2022.11.01T13.31.13/1-read-4KiB/fio.job):
[global]
bs=4k
runtime=30
ioengine=libaio
iodepth=32
direct=1
sync=0
time_based=1
clocksource=gettimeofday
ramp_time=5
write_bw_log=fio
write_iops_log=fio
write_lat_log=fio
log_avg_msec=1000
write_hist_log=fio
log_hist_msec=10000
log_unix_epoch=1

[job-/fio]
filename=/fio
rw=read
size=4M
numjobs=1

running fio job: /var/lib/pbench-agent/fio__2022.11.01T13.31.13/1-read-4KiB/fio.job (sample1)
pbench-start-tools: WARNING: No tools are registered
pbench-stop-tools: WARNING: No tools are registered
pbench-send-tools: WARNING: No tools are registered
[fio-postprocess-viz.py] Chart Type: timeseries (/var/lib/pbench-agent/fio__2022.11.01T13.31.13/1-read-4KiB/sample1/clients/localhost/hist/results.html)
[fio-postprocess-viz.py] Chart Type: timeseries (/var/lib/pbench-agent/fio__2022.11.01T13.31.13/1-read-4KiB/sample1/hist/results.html)
fio job complete
pbench-postprocess-tools: WARNING: No tools are registered

THE OUTPUT STAYS HERE FOR ABOUT 60s

[WARNING] localhost/fio_clat.1.log: timestamp 1667309509861 for rwtype 0 found multiple times, values [148226, 103613] will be averaged
[WARNING] localhost/fio_lat.1.log: timestamp 1667309509861 for rwtype 0 found multiple times, values [151914, 107581] will be averaged
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/redis/connection.py", line 559, in connect
    sock = self._connect()
  File "/usr/lib/python3.10/site-packages/redis/connection.py", line 615, in _connect
    raise err
  File "/usr/lib/python3.10/site-packages/redis/connection.py", line 603, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/pbench-agent/util-scripts/pbench-tool-meister-stop", line 14, in <module>
    sys.exit(main())
  File "/opt/pbench-agent/lib/pbench/agent/tool_meister_stop.py", line 306, in main
    return start(sys.argv[0], parsed)
  File "/opt/pbench-agent/lib/pbench/agent/tool_meister_stop.py", line 209, in start
    with Client(
  File "/opt/pbench-agent/lib/pbench/agent/tool_meister_client.py", line 125, in __enter__
    self.to_client_chan = RedisChannelSubscriber(
  File "/opt/pbench-agent/lib/pbench/agent/redis_utils.py", line 50, in __init__
    self._pubsub.subscribe(channel_name)
  File "/usr/lib/python3.10/site-packages/redis/client.py", line 3580, in subscribe
    ret_val = self.execute_command('SUBSCRIBE', *iterkeys(new_channels))
  File "/usr/lib/python3.10/site-packages/redis/client.py", line 3466, in execute_command
    self.connection = self.connection_pool.get_connection(
  File "/usr/lib/python3.10/site-packages/redis/connection.py", line 1192, in get_connection
    connection.connect()
  File "/usr/lib/python3.10/site-packages/redis/connection.py", line 563, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to fedora:17001. Connection refused.
[error][2022-11-01T13:32:50.802272934] [pbench-fio]: failed to stop the tool meisters.

Expected Results

It should simply execute and finish with no issues (and no background sysinfo being collected)

Additional information

pbench-agent-0.71.0-3g85910732a.noarch

@portante portante self-assigned this Nov 1, 2022
@portante portante added bug tools Of and related to the operation and behavior of various tools (iostat, sar, etc.) Tool Meister Of and relating to the Tool Meister sub-system labels Nov 1, 2022
@portante portante added this to To do in v0.72 via automation Nov 1, 2022
@portante portante added this to the v0.72 milestone Nov 1, 2022
@portante portante moved this from To do to In progress in v0.72 Nov 1, 2022
@portante
Copy link
Member

portante commented Nov 1, 2022

Thanks @ldoktor, I'll be looking into this problem.

@portante portante added the fio pbench-fio benchmark related label Nov 1, 2022
@portante
Copy link
Member

portante commented Nov 1, 2022

Are you able to quickly do the same test in your environment, but using pbench-user-benchmark? Also, we should probably have a noop tool to help in these cases. If you don't have any tools registered for a host then no tool meister will be started there, so the sysinfo collection won't take place.

@portante
Copy link
Member

portante commented Nov 2, 2022

So there are multiple issues here. Starting from the inner-most to the outer most:

  1. The 60 pause needs to be investigated further to verify, but most likely is related to a 60 timeout used when trying to talk to a Redis server
    • The fix will likely be a change to pbench-tool-meister-stop to recognize no tools are present, so that it won't try the action which waits for the timeout
  2. The traceback at the end is a poor handling for Errno 111, connection refused
    • We should be properly handling that error reporting as "the redis server is not listening"
    • We should also be tracking the error messages encountered during the 60 second window and reporting a summary of what was encountered, and only a stack trace for unrecognized issues
  3. The various WARNING messages about no tools being registered should not be repeated so often, just once for pbench-fio
  4. The invocation of the tool meister sub-system should not happen if no tools are registered
  5. There should be an ability to register a host just for sysinfo data collection but no tools registered

@ldoktor
Copy link
Contributor Author

ldoktor commented Nov 2, 2022

Yes, the 60s pause is the redis connection (I should have probably written that into the report). As for the solutions, I don't know. It could be a noop tool, or the TM can iterate over an empty list or the TM can be just ignored in those cases.

@portante portante modified the milestones: v0.72, v0.73 Mar 14, 2023
@portante portante removed this from In progress in v0.72 Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fio pbench-fio benchmark related Tool Meister Of and relating to the Tool Meister sub-system tools Of and related to the operation and behavior of various tools (iostat, sar, etc.)
Projects
Status: In Progress
Development

No branches or pull requests

2 participants