Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send an alert when data is stale #303

Open
openjck opened this issue Feb 12, 2019 · 7 comments
Open

Send an alert when data is stale #303

openjck opened this issue Feb 12, 2019 · 7 comments

Comments

@openjck
Copy link
Contributor

openjck commented Feb 12, 2019

An alert should be sent when the site is showing stale data. See #297 as an example of when this has happened.

It's unclear to me if this should be done here, in ensemble-transposer, or in Fx_Usage_Report. Perhaps more than one.

@openjck openjck changed the title Test that recent data is being shown Send an alert when data is stale Feb 12, 2019
@pdehaan
Copy link
Collaborator

pdehaan commented Feb 14, 2019

Not sure where to put it (here vs transposer), but I did have a rough script that returns how stale the YAU data is and returns a value such as "-6d" (or 6 days old): https://github.com/pdehaan/ensemble-data-test

Per https://github.com/mozilla-services/Dockerflow, services should have a /__heartbeat__ endpoint (similar to /__version__ which tells us which SHA is deployed):

Respond to /__heartbeat__ with a HTTP 200 or 5xx on error. This should check backing services like a database for connectivity and may respond with the status of backing services and application components as a JSON payload.

So, maybe we just check a couple of choice endpoints to see what the latest date in the dataset is, and return a 500 error if the data is more than -7d old. Then we'd need to make sure OPs is monitoring that heartbeat endpoint and then maybe they ping us if the data is stale. Not sure how it'd work w/ their monitoring tools. I would hate to think that somebody on pagerduty gets paged at 3am on a Sunday because the data is 8 days old.

@pdehaan
Copy link
Collaborator

pdehaan commented Feb 15, 2019

Here's an example of scraping the https://data.firefox.com/dashboard/hardware dashboard and grabbing the select#date-selector option:first-child text using a headless puppeteer:

const ms = require("ms");
const puppeteer = require("puppeteer");

async function main() {
  const sel = "select#date-selector option";

  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://data.firefox.com/dashboard/hardware", {waitUntil: "networkidle2"});
  await page.waitForSelector(sel);
  const lastModified = await page.$eval(sel, el => el.textContent);
  const diff = getAge(lastModified);

  console.log(`[${diff}] ${lastModified}`);

  if (parseInt(diff, 10) < -7) {
    process.exitCode = 1;
  }
  browser.close();
}

function getAge(date) {
  return ms(new Date(date) - Date.now());
}

main();

Although it isn't especially speedy since it takes about 5s to launch a headless browser and wait for the page to load/render:

$ time node check-hardware-dashboard.js
[-12d] February 3, 2019

node check-hardware-dashboard.js  0.44s user 0.19s system 12% cpu 5.020 total

@openjck
Copy link
Contributor Author

openjck commented Feb 19, 2019

Excellent. Thank you, @pdehaan! I'll look into this.

@openjck
Copy link
Contributor Author

openjck commented Feb 19, 2019

Note to self: the code in #297 is also worth looking at.

@openjck
Copy link
Contributor Author

openjck commented Feb 22, 2019

@pdehaan launched a site which reports on the freshness of all data. This could be a great thing for us to leverage. 😃

Site: https://ensemble-last-modified.now.sh/

Repo: https://github.com/pdehaan/ensemble-last-modified

@pdehaan
Copy link
Collaborator

pdehaan commented Mar 25, 2019

https://ensemble-last-modified.now.sh/ is currently saying the dashboard data is currently 9-10 days old:

{
  "source": "https://github.com/mozilla/ensemble",
  "version": "1.0.0",
  "commit": "5753d4021c792b3af31174a8cb473c10549f82ae",
  "dashboads": {
    "/datasets/desktop/user-activity": "-10d",
    "/datasets/desktop/usage-behavior": "-10d",
    "/datasets/desktop/hardware": "-9d"
  },
  "homepage": "https://github.com/pdehaan/ensemble-last-modified"
}

@openjck
Copy link
Contributor Author

openjck commented Mar 25, 2019

Ah! Thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants