This is the third project of Udacity: Full Stack Web Developer Nanodegree
by Alan Po-Ching
Analyze the logs and generate a report for a newspaper site powered by PostgreSQL datebase.
The report should answer the following questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
- The code should generate correct answers to the questions
- The code should generate output in clearly formatted plain text.
- The answer should be derived mainly by SQL queries but not by python code.
- Install Python3
- Download and install VirtualBox
- Download and install Vagrant
- Fort or clone this repository
- Place the report.py of this repo under the /vagrant directory
- Download the database script, unzip and place it under the /vagrant directory
- Navigate to the /vagrant directory and run
vagrant up
to start the virtual machine - Run
vagrant ssh
to log in the virtual machine
- Run
psql -d news -f newsdata.sql
on the virtual machine /vagrant folder to create the database - Run the following code to create the necessary views:
create view daily_access as
select to_char(time, 'Month, DD, YYYY') as date, count(status) as access
from log
group by date
order by date
create view daily_error as
select to_char(time, 'Month, DD, YYYY') as date, count(status) as error
from log
where status like '4%'
group by date
order by date
create view error_rate as
select daily_access.date, round(100.00 * error / access, 2) as rate
from daily_access, daily_error
where daily_access.date = daily_error.date
- Run
python3 report.py
to generate the analysis report