Machine readable drenv results #1306

nirs · 2024-03-31T15:37:48Z

When drenv fails we have good logging for human:

drenv.commands.Error: Command failed:
   command: ('addons/ocm-cluster/start', 'dr1', 'hub')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/nsoffer/ramen/test/addons/ocm-cluster/start", line 173, in <module>
          deploy(cluster_name, hub_name)
        File "/home/nsoffer/ramen/test/addons/ocm-cluster/start", line 60, in deploy
          wait_for_managed_cluster(cluster, hub)
        File "/home/nsoffer/ramen/test/addons/ocm-cluster/start", line 135, in wait_for_managed_cluster
          kubectl.wait(
        File "/home/nsoffer/ramen/test/drenv/kubectl.py", line 141, in wait
          _watch("wait", *args, context=context, log=log)
        File "/home/nsoffer/ramen/test/drenv/kubectl.py", line 157, in _watch
          for line in commands.watch(*cmd, input=input):
        File "/home/nsoffer/ramen/test/drenv/commands.py", line 155, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('kubectl', 'wait', '--context', 'hub', 'managedcluster/dr1', '--for=jsonpath={.spec.hubAcceptsClient}=true', '--timeout=60s')
         exitcode: 1
         error:
            error: timed out waiting for the condition on managedclusters/dr1

But this is not helpful when trying to analyze 300 runs. We need a machine readable format that can be consumed by a program to compute:

number of errors
which addon produce most of the errors?
which command in the addon produces most of the errors?
which time of day most has most errors?
run time stats (avg, min, max)

For this example we could use something like:

$ drenv start --output json envs/regional-dr.yaml
{
  "error": {
    "command": [
      "addons/ocm-cluster/start",
      "dr1",
      "hub"
    ],
    "exitcode": 1,
    "error": {
      "command": [
        "kubectl",
        "wait",
        "--context",
        "hub",
        "managedcluster/dr1",
        "--for=jsonpath={.spec.hubAcceptsClient}=true",
        "--timeout=60s"
      ],
      "exitcode": 1,
      "error": "error: timed out waiting for the condition on managedclusters/dr1"
    }
  },
  "time": 447,
  "started": "2024-03-30 20:11:18.147614957 -0400",
  "finished": "2024-03-30 20:18:48.656273959 -0400"
}

The timing info can be computed by a program running drenv, but computing this in drenv makes it easier to collect this info in all environment.

The default format can be yaml, using the same internal representation:

error:
  command:
  - addons/ocm-cluster/start
  - dr1
  - hub
  exitcode: 1
  error:
    command:
    - kubectl
    - wait
    - --context
    - hub
    - managedcluster/dr1
    - --for=jsonpath={.spec.hubAcceptsClient}=true
    - --timeout=60s
    exitcode: 1
    error: 'error: timed out waiting for the condition on managedclusters/dr1'
time: 447
started: 2024-03-30 20:11:18.147614957 -0400
finished: 2024-03-30 20:18:48.656273959 -0400

The text was updated successfully, but these errors were encountered:

nirs added enhancement New feature or request test Testing related issue labels Mar 31, 2024

nirs mentioned this issue Apr 2, 2024

drenv error profile #1313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine readable drenv results #1306

Machine readable drenv results #1306

nirs commented Mar 31, 2024

Machine readable drenv results #1306

Machine readable drenv results #1306

Comments

nirs commented Mar 31, 2024