Article Scraper

About

A command line tool written in Python 3 to scrape HTML and PDF articles from a given url.

Installation

Install Python 3
cd to the project folder
Create and activate a virtual env, e.g.: python3 -m virtualenv env && source env/bin/activate
Install required libraries: pip install -r requirements.txt
Install the application pip install .

Tests

Ensure that you are in the virtualenv where the libraries were installed (see step 3 in Installation)
cd to the project folder and: python -m unittest discover -s tests

Documentation

To view available command line options, in a terminal type: scrape -h

Examples

To view the title and body that will be scraped from a URL

scrape $url --dry-run

Where $url is the URL you wish to scrape (content type must be HTML/PDF).

To scrape a URL and save the title, body and URL in a JSON file

scrape $url

The JSON file will be saved in a /articles directory.

The directory will be created if it doesn't exist and the location will be printed as the articles are saved.

To scrape a URL and save the title, body and URL in a JSON file in a custom directory

scrape $url -O /path/to/custom/directory

To scrape multiple URLS

scrape $url1 $url2 $url3

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
scraper		scraper
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scraper

scraper

tests

tests

.gitattributes

.gitattributes

.gitignore

.gitignore

.travis.yml

.travis.yml

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Article Scraper

About

Installation

Tests

Documentation

Examples

To view the title and body that will be scraped from a URL

To scrape a URL and save the title, body and URL in a JSON file

To scrape a URL and save the title, body and URL in a JSON file in a custom directory

To scrape multiple URLS

About

Releases

Packages

Languages

james-o-johnstone/article-scraper

Folders and files

Latest commit

History

Repository files navigation

Article Scraper

About

Installation

Tests

Documentation

Examples

To view the title and body that will be scraped from a URL

To scrape a URL and save the title, body and URL in a JSON file

To scrape a URL and save the title, body and URL in a JSON file in a custom directory

To scrape multiple URLS

About

Resources

Stars

Watchers

Forks

Languages