Skip to content

Commit

Permalink
Update README for clarity
Browse files Browse the repository at this point in the history
  • Loading branch information
joncinque committed Feb 4, 2020
1 parent 28df3ef commit c822038
Showing 1 changed file with 21 additions and 11 deletions.
32 changes: 21 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
The class scraper is made up of two main components, getcourse.js and parsecourse.js,
linked together with a toplevel, toplevel.js

# Installing
# Requirements

## node
```bash
Expand All @@ -11,17 +11,9 @@ curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt install -y nodejs
```

## Chrome or Chromium
## Chromium
```bash
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt update
sudo apt install -y google-chrome-stable
```

## npm components
```bash
npm install
sudo apt install -y chromium-browser
```

## Building mongo-tools dependency for ARM systems
Expand Down Expand Up @@ -50,6 +42,20 @@ cd mongo-tools
./build.sh
```

# Setup

## NPM packages
```bash
npm install
```

## Environment file
```bash
echo "MONGO_URI=mongodb://localhost" > .env
echo "DB_NAME=classes_database" >> .env
echo "COLLECTION=classes_collection" >> .env
```

# Repo Components

## toplevel.js
Expand Down Expand Up @@ -85,3 +91,7 @@ try on. For now, this only handles MBO pages.
## chromegetcourse.js
Gets the pages sequentially using a local headless Chrome browser, faster and more
modern version of scraping.

## sendtomongo.js
Utility for sending data to a MongoDB. `mongoimport` can equally be used if the
database is running on the scraping machine.

0 comments on commit c822038

Please sign in to comment.