Skip to content

A NodeJS program that generates lighthouse reports and stores them in Cloud SQL.

Notifications You must be signed in to change notification settings

fotoflo/lighthouse-reporter-mysql

 
 

Repository files navigation

Lighthouse Reporter

This tool generates lighthouse reports, parses the data, and stores it in a Cloud SQL database. The data is then pulled into GDS to create reports.

Preparation

Clone the repository. Install the dependencies as usual:

npm install

Create a folder called 'input' in the root directory of the project. This folder will contain the CSV of URLs to generate reports.

Also, create a file called '.env' in the root directory of the project. This needs to contain the login info for your database

export DB_HOST=[address of your CloudSQL database]
export DB_USER=[postgres username]
export DB_PASS=[password for database user]
export DB_NAME=[name of the database]
export DB_PORT=[port of the database]

then run `source .env` to add those environment variables to your environment - you can run `printenv` to validate they are now part of the runtime environment 

NOTE: The Dockerfile is not needed if you aren't going to Dockerize this application. If you are going to dockerize the application you should remove export from the .env and done need to run source .env

enable the google sheets api

https://console.cloud.google.com/apis/api/sheets.googleapis.com/metrics?project=fastmonitor create credentials follow the service account getting started: https://theoephraim.github.io/node-google-spreadsheet/#/getting-started/authentication

create a server on google cloud free tier

gcloud compute instances create mysql-one 
    --project=fastmonitor
    --zone=us-central1-a 
    --machine-type=e2-micro 
    --network-interface=network-tier=PREMIUM,subnet=default 
    --maintenance-policy=MIGRATE 
    --provisioning-model=STANDARD 
    --service-account=429388869501-compute@developer.gserviceaccount.com 
    --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append 
    --tags=mysql,web,http-server,https-server 
    --create-disk=auto-delete=yes,boot=yes,device-name=instance-1,image=projects/debian-cloud/global/images/debian-11-bullseye-v20221102,mode=rw,size=10,type=projects/fastmonitor/zones/us-central1-a/diskTypes/pd-balanced 
    --no-shielded-secure-boot 
    --shielded-vtpm 
    --shielded-integrity-monitoring 
    --reservation-affinity=any

add your ssh key to the instance

 ssh-add ~/.ssh/google_compute_engine

connect

gcloud config set project PROJECT-NAME
ssh mysql-one.us-central1-a.fastmonitor

install mysql

    7  apt install gnupg wget
    8  sudo apt-get install -y mysql-community-server
    9  wget https://dev.mysql.com/get/mysql-apt-config_0.8.31-1_all.deb
   14  sudo dpkg -i mysql-apt-config*

or setup meta data
Upload the public key key
and set the Key `google-ssh` to your username

https://console.cloud.google.com/compute/metadata?project=fastmonitor then ssh -i ~/.ssh/private_key username@host ``

setup rmate / vsmate https://marketplace.visualstudio.com/items?itemName=rafaelmaiolla.remote-vscode

start the server

ssh -i ~/.ssh/google_compute_engine -R 52698:127.0.0.1:52698 fotoflo@34.69.21.94

curl -Lo ~/bin/rmate https://raw.githubusercontent.com/textmate/rmate/master/bin/rmate chmod a+x ~/bin/rmate

set the server timezone

sudo timedatectl set-timezone Asia/Singapore

install mysql

    7  sudo apt install gnupg wget
    8  sudo apt-get install -y mysql-community-server

    //https://dev.mysql.com/downloads/file/?id=515363
    9  wget https://dev.mysql.com/get/mysql-apt-config_0.8.31-1_all.deb
   14  sudo dpkg -i mysql-apt-config
   15  sudo apt update
   16  sudo apt install mysql-server
   17  sudo mysql_secure_installation
   18  sudo mysql -u root -p

create the user

    CREATE USER 'fastmonitor'@'%' IDENTIFIED WITH mysql_native_password BY 'PASSWORD';
    CREATE database fastmonitor;
    CREATE database fastmonitor_dev;
    


    CREATE database lighthouse_test;
    CREATE USER 'lighthouse'@'%' IDENTIFIED WITH mysql_native_password BY 'PASSWORD';
    GRANT ALL PRIVILEGES ON lighthouse_test.* TO 'lighthouse'@'%' WITH GRANT OPTION;

    CREATE USER 'fastdev'@'%' IDENTIFIED WITH mysql_native_password BY 'PASSWORD';
    GRANT ALL PRIVILEGES ON *.* TO 'fastmonitor'@'%' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON *.fastmonitor_dev TO 'fastdev'@'%' WITH GRANT OPTION;

    GRANT ALL PRIVILEGES ON *.* TO 'fastmonitor'@'%' WITH GRANT OPTION;
    GRANT ALL PRIVILEGES ON lighthouse.* TO 'fastdev'@'%' WITH GRANT OPTION;
    
    //read only
    CREATE USER 'GDS'@'%' IDENTIFIED WITH mysql_native_password BY 'PASSWORD';
    GRANT SELECT, SHOW VIEW  ON lighthouse.* TO 'GDS'@'%' WITH GRANT OPTION;

    FLUSH PRIVILEGES;

open the firewall create a rule to allow port 3303 - appl

https://console.cloud.google.com/networking/firewalls/list?project=fastmonitor

create a new rule:
Targets
Target tags
mysql
Source filters
IP ranges
0.0.0.0/0

test connetion!

install nodejs

sudo snap install node --classic
sudo apt install snapd
sudo snap install core

sudo snap install node --classic
``



## Usage

Place a CSV of URLs in the 'input' folder. The CSV should have two columns: URL and Template. Here's an example:

| URL                             | Template         |
|---------------------------------|------------------|
| https://example.com             | Home Page        |
| https://example.com/about-us    | Information Page |
| https://example.com/product/123 | Product Page     |

Run the tool by calling the start script:

    npm start

The tool will generate lighthouse reports and store them in the database.

### Automatic Reporting

 - Note this is untested in this fork

The tool can run reports automatically for a specified duration of time. To do that, simply setup a crontab that executes the tool every day with an empty 'input' folder. This will run the tool in maintainence mode, and it will automatically report on URLs stored in the database.

To setup an automated report, place a CSV of URLs in the input folder just like usual. Add the following parameters to the command:

    npm start auto 30 90
    
The numbers can be replaced with whatever values you'd like. This example will automatically generate reports for the specified CSV of URLs every 30 days for the next 90 days.

### test 
The test folder allows automation testing of some functions but is not complete

### backfill
The backfill.js file copies the database from the old, pre-fork database, to the new db.  Unfortuatelly it isnt fully working and we decided to move on with this project.

## Creating A New Report

### Creating the data sources

Now, go into GDS and create new data sources for each table. Select 'MySQL' as the source type and input the credentials for the Cloud SQL database. Then select the tables one by one.

### Whitelisting The Database

If you're using a database behind a firewall (such as Cloud SQL) you'll need to whitelist the Google Data Studio servers to allow connections.

The directions for whitelisting IP addresses in Cloud SQL can be found here: https://cloud.google.com/sql/docs/mysql/configure-ip

The list of IP addresses required by datastudio is listed under 'Notes' on this page: https://support.google.com/datastudio/answer/7288010?hl=en&ref_topic=7332343

### Creating the new GDS report

Go to the template report here: https://datastudio.google.com/open/174e2h3Y8WVk1i7ufD4yxJ8aWfPG8ImOA. In the upper right corner select the button to create a copy of the report.

![Copy report button](https://github.com/LocalSEOGuide/lighthouse-reporter/blob/master/docs/docs_copy_report.jpg "Copy Report")

It will ask you to select data sources to replace the ones in the original report. Choose the views you created previously. Using Target as an example, that should look like this:

![Set sources](https://github.com/LocalSEOGuide/lighthouse-reporter/blob/master/docs/docs_set_sources.jpg "Set sources")

The new report should populate with data from the views you previously created.

## Creating a Partitioned Data Source (for running multiple partitioned reports from the same server)
If we want to create data sources who's data is partitioned and cannot see the whole database, but can only see data from a specific page's report, we can create a MySQL data source on top of a query, instead of a table.

Example: If in /input/input-live.csv we have:

URL,Template https://url1.com,TemplateName1 https://url2.com,TemplateName2


Then in GDS we would use the following query to select only those templates for the gds_audits data souce:
`select * from gds_audits where template in ('TemplateName1', 'TemplateName2');`

## Database Structure

These are the tables used by the tool. In general, rows will be queried by the URL of the audit, and the time the report was fetched (fetch_time).

### raw_reports

0. id - SERIAL PRIMARY KEY - For unique identification
1. url - VARCHAR(2048) - The URL of the report
2. template - VARCHAR(2048) - The template to which this page belongs
3. fetch_time - TIMESTAMP - The fetch_time of the report where this request originated
4. report - json - The raw JSON of the report

### urls

This table stores URLs that will be updated automatically.

0. id - SERIAL PRIMARY KEY - For unique identification
1. url - VARCHAR(2048) - The URL of the report
2. template - VARCHAR(2048) - The template to which this page belongs
3. start_date - timestamp - The date this report was started
4. latest_date - timestamp - The date of the latest report
5. interval - DECIMAL - The number of days between updates
6. lifetime - DECIMAL - The number of days this URL should reported on before being discontinued.

### gds_audits

This table contains the basic performance metrics.

0. id - SERIAL PRIMARY KEY - For unique identification
1. url - VARCHAR(2048) - The URL of the report
2. template - VARCHAR(2048) - The template to which this page belongs
3. fetch_time - TIMESTAMP - The time the report was run
4. page_size - DECIMAL - The page size in KBs
5. first_contentful_paint - DECIMAL - The FCP in milliseconds
6. max_potential_fid - DECIMAL - The Maximum FID in milliseconds
7. time_to_interactive - DECIMAL - The TTI in milliseconds
8. first_meaningful_paint - DECIMAL - The FMP in milliseconds
9. first_cpu_idle - DECIMAL - The FCI in milliseconds

### resource_chart

This table contains information about resource requests performed during an audit.

0. id - SERIAL PRIMARY KEY - For unique identification
1. audit_url - VARCHAR(2048) - The URL of the page where this request originated
2. template - VARCHAR(2048) - The template to which this page belongs
3. fetch_time - TIMESTAMP - The fetch_time of the report where this request originated
4. resource_url - VARCHAR(2048) - The URL of the requested resource
5. resource_type - VARCHAR(2048) - The type of resource (Script, Stylesheet, etc.)
6. start_time - DECIMAL - The time the request started in milliseconds
7. end_time - DECIMAL - The time the request ended in milliseconds

### savings_opportunities

This table contains information about the savings opportunities found during an audit.

0. id - SERIAL PRIMARY KEY - For unique identification
1. audit_url - VARCHAR(2048) - The URL of the page where this request originated
2. template - VARCHAR(2048) - The template to which this page belongs
3. fetch_time - TIMESTAMP - The fetch_time of the report where this request originated
4. audit_text - VARCHAR(2048) - The name of the audit in question
5. estimated_savings - DECIMAL - The estimated time saved in milliseconds

### diagnostics

This table contains some useful page statistics.

0. id - SERIAL PRIMARY KEY - For unique identification
1. audit_url - VARCHAR(2048) - The URL of the page where this request originated
2. template - VARCHAR(2048) - The template to which this page belongs
3. fetch_time - TIMESTAMP - The fetch_time of the report where this request originated
4. diagnostic_id - VARCHAR(2048) - The ID of the audit in question
5. item_label - VARCHAR(2048) - The label for the specific item in that audit
6. item_value - DECIMAL - The numeric value for the item in question

### scores

This table contains the scores for SEO, Progressive Web App, Accessability, Performance and Best Practices

0. id - SERIAL PRIMARY KEY - For unique identification
1. audit_url - VARCHAR(2048) - The URL of the page where this request originated
2. template - VARCHAR(2048) - The template to which this page belongs
3. fetch_time - TIMESTAMP - The fetch_time of the report where this request originated
4. Category - VARCHAR(2048) - the type of score
5. Title - VARCHAR(2048) - Score in plain text
6. Score - INT - The numeric value of the score

About

A NodeJS program that generates lighthouse reports and stores them in Cloud SQL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 85.7%
  • TypeScript 11.8%
  • Dockerfile 2.2%
  • Shell 0.3%