Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize for automated use #27

Open
lauraseidler opened this issue May 2, 2022 · 1 comment
Open

Optimize for automated use #27

lauraseidler opened this issue May 2, 2022 · 1 comment

Comments

@lauraseidler
Copy link
Contributor

Hi everyone,

I'm really liking gcpdiag, and it has uncovered a few issues with our infrastructure already that we were able to fix, so a big thanks for making it available :)

As it proved to be really useful to us, I wanted to go one step further and automate running it, meaning:

  1. run it automatically in a given interval (e.g. every day or every week)
  2. run it on all projects matching a given filter (e.g. all projects in a specific folder)
  3. get notified if any issue is found

While I was somewhat successful in that (running a cronjob in a GKE cluster with a custom python script that fetches all projects, runs gcpdiag on all of them and then sends an alert via a webhook in case there's any errors) I uncovered a few challenges along the way, so I wanted to share some thoughts and get your opinions on whether these are things you would consider implementing or be open to contributions:

  • Even with --hide-ok, the output is still quite verbose in a non-interactive terminal, as the "in progress" logs that are meant to disappear stay visible. It would be great to have a way to turn off any logging that is not an error or a warning.
  • A logging format option that logs JSON might make sense, which is easier to parse for e.g. Cloud Logging. I see that some parts of the code use print for logging, so that probably needs to be changed to use the logging module in a first step, and then different format options could be introduced.
  • An additional command, e.g. lint-all could be useful to lint multiple projects at once (would need additional permissions on the SA)
  • Add a way to send alerts on failure, though this might get quite complicated as people use different systems. Maybe this could be achieved by logging in JSON, and then providing a template for e.g. a log based metric + alert.

Happy to hear your thoughts! :)

@schweikert
Copy link
Collaborator

Thanks a lot Laura for the feedback!

I agree that running gcpdiag automatically for example every day could be very useful. In fact I was also speaking about why this would be a good idea to some colleagues :-) What makes it particularly useful is that we plan to release new gcpdiag versions with new rules often, so running gcpdiag regularly would make you notice if anything new is detected by those rules (besides making sure that everything stays good with the existing rules as well).

In order to better support that use case, we have implemented config files which allow you to specify configuration variables on a per-project level (such as for example excluding some rules that you don't care about for certain projects), and also we have implemented support for CSV and JSON output (thanks to @dawidmalina). The CSV+JSON output will be released with the next version.

Cheers
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants