Skip to content

Working with the Database

Mike Williamson edited this page May 12, 2022 · 3 revisions

Context

Tracker runs common security tools on public sites and aggregates the results. From a data perspective there is nothing in the database that attackers won't already have, and since passwords are hashed and salted the contents of the database have nothing sensitive in them.

Setup

This assumes you have the kubectl command installed, and a local installation of ArangoDB, which supplies the commands arangodump and arangorestore.

You'll probably also want Docker installed.

Disposable databases

Every once in a while it's really useful to run a disposable copy of your database. Quick experiments, testing and CI all benefit from this. The ArangoDB Docker image is great for this.

echo ARANGO_ROOT_PASSWORD=test > arangodb.env 
docker run -d -p=8529:8529 --env-file arangodb.env --name=arango arangodb

Making a local copy of the database

For backup and development purposes it is sometimes desirable to have a copy of the production data. Production data has all the variablilty the real world produces and is super useful from time to time to test various scripts. If you need a copy of the Tracker database this is how to do it.

Since the database is never exposed to the outside world, we need to forward the ArangoDB service port to your local machine with kubectl's port-forward command.

kubectl port-forward -n db svc/arangodb 8529:8529

That port forwarding means I can now treat the Arango on GKE like it's running locally, which means we can now use the arangodump tool.

arangodump --server.database track_dmarc --output-directory track_dmarc

This does exactly what it promises, and dumps the contents of the track_dmarc database into a track_dmarc folder. With that done, kill the port forwarding and start your local ArangoDB instance that you want to put the data into. Inserting the data is done with arangorestore, and we're using the --create-database option here to ensure the database is created if it doesn't already exist:

arangorestore --create-database --server.database track_dmarc --input-directory track_dmarc

Experimentation

ArangoDB has a great web interface with a very helpful query editor. It's great way to experiment with queries and understand their performance.

On top of that, arangosh is a very helpful tool for exploring queries and inserting data. The main object you will interact with is the db object, which can be explored with tab completion.

[mike@ouroboros tracker]$ arangosh
Please specify a password: 
                                       _     
  __ _ _ __ __ _ _ __   __ _  ___  ___| |__  
 / _` | '__/ _` | '_ \ / _` |/ _ \/ __| '_ \ 
| (_| | | | (_| | | | | (_| | (_) \__ \ | | |
 \__,_|_|  \__,_|_| |_|\__, |\___/|___/_| |_|
                       |___/                 
arangosh (ArangoDB 3.7.1 [linux] 64bit, using jemalloc, build , VPack 0.1.33, RocksDB 6.8.0, ICU 64.2, V8 7.9.317, OpenSSL 1.1.1h  22 Sep 2020)
Copyright (c) ArangoDB GmbH
Command-line history will be persisted when the shell is exited. You can use `--console.history false` to turn this off
Connected to ArangoDB 'http+tcp://127.0.0.1:8529, version: 3.7.1 [SINGLE, server], database: '_system', username: 'root'
Type 'tutorial' for a tutorial or 'help' to see common examples
127.0.0.1:8529@_system> db._useDatabase("track_dmarc")
true
127.0.0.1:8529@track_dmarc> claims = db._collection("claims")
[ArangoCollection 7612957, "claims" (type edge, status loaded)]
127.0.0.1:8529@track_dmarc> claims.save({_to: "domains/643849", _from:"organizations/7624288"})
{ 
  "_id" : "claims/7628631", 
  "_key" : "7628631", 
  "_rev" : "_bnEYmX2---" 
}
127.0.0.1:8529@_system> db._query(`RETURN "Hello world"`).toArray()
[ 
  "Hello world" 
]

Restoring the database

Tracker is primarily protected from data loss by it's clustered database setup, spreading components of the database across multiple VM's in multiple availability zones. But life being what it is, we also make a daily backup via a cronjob.

The backups are stored in Cloud Storage and can be down loaded using gsutil.

gsutil -m cp -r \
  "gs://gc-tracker-backups/tracker-backup-2022-05-12T04:04+00:00" \
  .

With the folder sitting on your local drive, just port forward and restore, similar to what we did above.

kubectl port-forward -n db svc/arangodb 8529:8529
arangorestore --server.database track_dmarc --create-database true --create-collection true --include-system-collections true --input-directory "tracker-backup-2022-05-12T04:04+00:00"