Optimize for automated use #27

lauraseidler · 2022-05-02T12:58:28Z

Hi everyone,

I'm really liking gcpdiag, and it has uncovered a few issues with our infrastructure already that we were able to fix, so a big thanks for making it available :)

As it proved to be really useful to us, I wanted to go one step further and automate running it, meaning:

run it automatically in a given interval (e.g. every day or every week)
run it on all projects matching a given filter (e.g. all projects in a specific folder)
get notified if any issue is found

While I was somewhat successful in that (running a cronjob in a GKE cluster with a custom python script that fetches all projects, runs gcpdiag on all of them and then sends an alert via a webhook in case there's any errors) I uncovered a few challenges along the way, so I wanted to share some thoughts and get your opinions on whether these are things you would consider implementing or be open to contributions:

Even with --hide-ok, the output is still quite verbose in a non-interactive terminal, as the "in progress" logs that are meant to disappear stay visible. It would be great to have a way to turn off any logging that is not an error or a warning.
A logging format option that logs JSON might make sense, which is easier to parse for e.g. Cloud Logging. I see that some parts of the code use print for logging, so that probably needs to be changed to use the logging module in a first step, and then different format options could be introduced.
An additional command, e.g. lint-all could be useful to lint multiple projects at once (would need additional permissions on the SA)
Add a way to send alerts on failure, though this might get quite complicated as people use different systems. Maybe this could be achieved by logging in JSON, and then providing a template for e.g. a log based metric + alert.

Happy to hear your thoughts! :)

The text was updated successfully, but these errors were encountered:

schweikert · 2022-05-04T09:53:05Z

Thanks a lot Laura for the feedback!

I agree that running gcpdiag automatically for example every day could be very useful. In fact I was also speaking about why this would be a good idea to some colleagues :-) What makes it particularly useful is that we plan to release new gcpdiag versions with new rules often, so running gcpdiag regularly would make you notice if anything new is detected by those rules (besides making sure that everything stays good with the existing rules as well).

In order to better support that use case, we have implemented config files which allow you to specify configuration variables on a per-project level (such as for example excluding some rules that you don't care about for certain projects), and also we have implemented support for CSV and JSON output (thanks to @dawidmalina). The CSV+JSON output will be released with the next version.

Cheers
David

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize for automated use #27

Optimize for automated use #27

lauraseidler commented May 2, 2022

schweikert commented May 4, 2022

Optimize for automated use #27

Optimize for automated use #27

Comments

lauraseidler commented May 2, 2022

schweikert commented May 4, 2022