Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis Build Failure monitor(ing) #469

Closed
wants to merge 7 commits into from

Conversation

NickLaMuro
Copy link
Member

@NickLaMuro NickLaMuro commented Jan 14, 2020

Note: Currently still finalizing a lot here, so this is not ready for #PrimeTime™. Just putting this up so folks know what is going on in the bot channel.

Adds a new worker that is in charge of monitoring Travis CI for build failures on monitored branches, and will post notifications in their respective channels.

This pull requests adds multiple classes/models/libs to help facilitate this functionality:

  • The aforementioned "worker" (TravisBranchMonitor) class, that is mostly in charge of querying builds we care about, creating BuildFailure updating Branch records as needed, and sending BuildFailureNotifications when necessary.
  • A build_failures Updates to the branches table/model, which keeps track of previously failed builds (helps to avoid spam in chat, and we can post when a previously failing build has been fixed)
  • BuildFailureNotifier: A service object in charge of the sending messages to gitter. Uses the gitter-api-ruby gem to do that.
  • TravisV3Client: A simple Faraday client lib that allows for using v3 API endpoints with the v2 API based travis gem.
  • Gemfile: updates the travis gem and adds gitter-api-ruby.

Links

@miq-bot miq-bot changed the title [POC][WIP] Travis Build Failure monitor(ing) [WIP] [POC][WIP] Travis Build Failure monitor(ing) Jan 14, 2020
@miq-bot miq-bot added the wip label Jan 14, 2020
@NickLaMuro NickLaMuro changed the title [WIP] [POC][WIP] Travis Build Failure monitor(ing) [WIP] [POC] Travis Build Failure monitor(ing) Jan 14, 2020
@NickLaMuro
Copy link
Member Author

NickLaMuro commented Jan 15, 2020

Update: I think this is a pretty decent working copy for a first pass.

Currently this will:

  • Monitor builds and post for failing builds on 'master' branches
  • Only again post after it has been a day with no change
  • Post that the build is passing once it does, and delete the BuildFailure record

Using the following config settings:

---
# ...
gitter:
  api_url: http://localhost:5000
  api_prefix: "/api/v1"
# ...
travis_branch_monitor:
  included_repos:
    - ManageIQ/manageiq: not-real-group/community
    - GrumpyTech/manageiq-loggers: not-real-group/community

(note: note-real-group/community is the group/channel in my local copy of gitter. This can be configured to work with real rooms).

For the following builds for the GrumpyTech/manageiq-loggers repo on the master branch:

  1. 💚 https://travis-ci.org/GrumpyTech/manageiq-loggers/builds/637174523
  2. 💚 https://travis-ci.org/GrumpyTech/manageiq-loggers/builds/637174582
  3. 🔴 https://travis-ci.org/GrumpyTech/manageiq-loggers/builds/637174979
  4. 💚 https://travis-ci.org/GrumpyTech/manageiq-loggers/builds/637189434

I got the following results running bundle exec rake travis_branch_monitor:poll_single:

Screen Shot 2020-01-14 at 7 17 23 PM

@NickLaMuro NickLaMuro changed the title [WIP] [POC] Travis Build Failure monitor(ing) Travis Build Failure monitor(ing) Jan 15, 2020
@miq-bot miq-bot removed the wip label Jan 15, 2020
@NickLaMuro NickLaMuro force-pushed the travis-failure-monitor branch 2 times, most recently from 6ec3222 to b276186 Compare January 15, 2020 02:04
@NickLaMuro
Copy link
Member Author

@Fryguy Addressed your changes. Thanks for the feedback.

@NickLaMuro NickLaMuro force-pushed the travis-failure-monitor branch 2 times, most recently from e665526 to 64aab6d Compare January 15, 2020 23:13
@NickLaMuro NickLaMuro force-pushed the travis-failure-monitor branch 3 times, most recently from 4fdc75a to e8fb1db Compare January 16, 2020 18:26
@NickLaMuro
Copy link
Member Author

NickLaMuro commented Jan 16, 2020

Use update instead of update_attributes

FYI: These rubocop warnings are on lines that are working with Travis::Client::Entity models, and are not ActiveRecord models. These warnings do not make sense here.

app/models/branch.rb Outdated Show resolved Hide resolved
app/workers/travis_branch_monitor.rb Outdated Show resolved Hide resolved
app/workers/travis_branch_monitor.rb Outdated Show resolved Hide resolved
end

def process_repo(repo)
# repo.regular_branch_names.each do |branch|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most definitely. This code definitely is subject to "copy-pasta syndrome".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup: I didn't review this comment without looking at the context.

So I think this is related to the TODO that is below:

# TODO:  Handle this for all release branches
process_branch(repo, 'master')

And probably what should be done is we should be doing repo.regular_branches (?) and passing the branch_record directly into process_branch instead of doing the lookup there. That said, unsure if regular_branches does what we want in this case. Would have to look.

That is also a question if we want to first start with 'master' for a first pass to test this feature out and then move on to the release branches going forward. That way we can determine if this is indeed the route we want to take with this, or if it is causing too much noise in chat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to track all of the regular_branches it probably only makes sense for a subset (master, jansa, ivanchuk, hammer) or something like that. Maybe that belongs in a setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that probably makes sense.

This probably falls in the "doesn't need to be a private setting" category argument that I have been making, but we can adjust that at a later date. I will look into making this a regular setting for now. 👍

Copy link
Member

@Fryguy Fryguy Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to track all of the regular_branches it probably only makes sense for a subset (master, jansa, ivanchuk, hammer) or something like that. Maybe that belongs in a setting?

We only track branches (in the database) that we care about, so I think all regular branches is ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to do some cleanup then because we are tracking everything back to darga.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently using .regular_branches for this, so I think that has resolved this from a code aspect. Leaving this unresolved just so it isn't lost as something we might want to look at on prod.

app/workers/travis_branch_monitor.rb Outdated Show resolved Hide resolved
branch_record = repo.regular_branches.where(:name => branch).first

# If we already have a failure record, call notify with that record
return branch_record.notify_of_failure if branch_record.previously_failing?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that it's still failing first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this probably is a "naming is hard" problem, but .notify_of_failure actually does that. It does the .passing? check which will determine if a repo is still failing for that given branch.

I do forget the rational for my logic as to "why" I split it up like this, so I will have to look at this again, but the logic does as you wish, it just is a little unclear.

db/migrate/20200109223351_add_branch_build_failures.rb Outdated Show resolved Hide resolved
>
> **Travis Build**: #{travis_build_url}
MSG
notification_msg << "> **Failure PR**: #{offending_pr}\n" if offending_pr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always a PR to that repo which causes the failure. Sometimes a merge to core breaks the API repo, but it is only noticed after the API repo cron runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is why the if offending_pr is there, which can return nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am open to suggestions on how we can improve this to better provide context for the message. I realize that this could still display a false positive by displaying the "last merged commit" on something that was merged weeks ago, so maybe only display the PR if it was merged recently? Within the last 3 days?

@bdunne
Copy link
Member

bdunne commented Mar 17, 2020

Also, does this need to be rebased due to the rails upgrade?

@NickLaMuro
Copy link
Member Author

Also, does this need to be rebased due to the rails upgrade?

Shouldn't, as I just rebased it earlier today (fixing that exact problem)

A small Travis API client library that is a compliment to the `travis`
gem, but allows the use the v3 API endpoints for specific calls with the
v2 based client library entities.

Currently implements two methods:

* `.repo_branch_builds`:  Fetch a collection of builds from a branch
* `.repo_build`:          Fetch a singular build via ID

For the second, the client actually still uses the v2 API, but doesn't
use the default model's `.builds.first` approach (doesn't filter/fetch
properly), and just fetches using the `find_one` interface.
This class is a service class in charge of the logistics of sending a
message to a configured gitter channel for a given repo, notifying there
has been a failure.
Use the Branch record to keep track of build failures that we find when
polling the Travis API.  The main purpose is to avoid spamming gitter
with extra notifications, but also can be used to identify that a branch
has previously been broken and now has been fixed.

Makes sense to also notify on the passing cases so others aren't taking
the time to investigate a failure when this has already been
investigated/fixed by another.
Worker that is in charge of monitoring Travis for build_failures, and
will create BuildFailure records and send messages to gitter as needed.
A task for triggering a single run of the TravisBranchMonitor without
needing sidekiq running.
@miq-bot
Copy link
Member

miq-bot commented May 18, 2020

Checked commits NickLaMuro/miq_bot@dad56ec~...d68b6a4 with ruby 2.5.7, rubocop 0.69.0, haml-lint 0.28.0, and yamllint
10 files checked, 3 offenses detected

lib/build_failure_notifier.rb

  • ❗ - Line 92, Col 33 - Style/SafeNavigation - Use safe navigation (&.) instead of checking if an object exists before calling the method.

spec/workers/travis_branch_monitor_spec.rb

@miq-bot
Copy link
Member

miq-bot commented Aug 31, 2021

This pull request is not mergeable. Please rebase and repush.

@chessbyte
Copy link
Member

@Fryguy now that we moved to GitHub Actions, assuming this should be closed?

@Fryguy Fryguy closed this Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bot should announce in chat when repos are failing
6 participants