Skip to content

A lightweight web-tracking app to record user actions on a single-page webapp. Uses Piwik® web analytics library to collect information and Node.js to store data in Redis, RabbitMQ or Kafka

License

ibm-watson-data-lab/simple-logging-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Logging Service

This project is based on the Simple Metrics Collector which is a simple way of collecting web analytics and storing the data in a Cloudant database. This project breaks this concept down using a Microservices architecture, so instead of just writing data to a Cloudant database, this project adds data to a variety out outputs, depending on a runtime environment variable:

  1. stdout - to the terminal only
  2. redis_queue - to a Redis queue
  3. redis_pubsub - to a Redis pubsub channel
  4. rabbit_queue - to a RabbitMQ queue
  5. rabbit_pubsub - to a RabbitMQ pubsub channel
  6. kafka - to an Apache Kafka or IBM Message Hub topic

schematic

Deploy to IBM Bluemix

One-Click Deployment

The fastest way to deploy this application to Bluemix is to click this Deploy to Bluemix button. Or, if you prefer working from the command line, skip to the Deploy Manually section.

Deploy to Bluemix

Don't have a Bluemix account? If you haven't already, you'll be prompted to sign up for a Bluemix account when you click the button. Sign up, verify your email address, then return here and click the the Deploy to Bluemix button again. Your new credentials let you deploy to the platform and also to code online with Bluemix and Git. If you have questions about working in Bluemix, find answers in the Bluemix Docs.

Deploy Manually to Bluemix

Configure Cloud Foundry

If you haven't already, install the Cloud Foundry command line interface and connect to Bluemix.

Deploy

To deploy to Bluemix, simply:

$ cf push

Note: You may notice that Bluemix assigns a URL to your application containing a random word. This is defined in the manifest.yml file where the random-route key set to the value of true. This ensures that multiple people deploying this application to Bluemix do not run into naming collisions. To specify your own route, remove the random-route line from the manifest.yml file and add a host key with the unique value you would like to use for the host name.

Privacy Notice: This web application includes code to track deployments to IBM Bluemix and other Cloud Foundry platforms. Tracking helps us measure our samples' usefulness, so we can continuously improve the content we offer to you. The following information is sent to a Deployment Tracker service on each deployment:

  • Application Name (application_name)
  • Space ID (space_id)
  • Application Version (application_version)
  • Application URIs (application_uris)

This data is collected from the VCAP_APPLICATION environment variable in IBM Bluemix and other Cloud Foundry platforms. IBM uses this data to track metrics around deployments of sample applications to Bluemix.

To disable deployment tracking, remove the following line from server.js:

require("cf-deployment-tracker-client").track();

Once that line is removed, you may also uninstall the cf-deployment-tracker-client npm package.

Environment variables

You can configure the installation by adding a number of custom environment variables and then restarting the application.

QUEUE_TYPE

The value of QUEUE_TYPE can be one of stdout, redis_queue, redis_pubsub, rabbit_queue, rabbit_pubsub or kafka. If a value is not set, then 'stdout' is assumed.

QUEUE_NAME

The value of QUEUE_NAME determines which queue/topic the data is written to. If omitted, it takes the following values for each of the queue types:

  1. stdout - n/a
  2. redis_queue - mcqueue
  3. redis_pubsub - mcpubsub
  4. rabbit_queue - mcqueue
  5. rabbit_pubsub - mcpubsub
  6. kafka - mcqueue

ETCD_URL

ETCD_URL determines which Etcd instance should be used for the Service Registry.

The Service Registry allows the Metrics Collector Microservice to be utilised by the Simple Search Service to log searches. This is achieved by using the Simple Service Registry module.

VCAP_SERVICES

VCAP_SERVICES is created for you by the Bluemix Cloud Foundry service. It defines the credentials of the attached services that this app can connect to.

Client-side code

Once the application is installed and configured, then your web-page needs to have code inserted into it to allow data to be collected e.g.

<html>
<body>
<div>
  <a href="https://www.google.com" title="this will be tracked">Tracked Link</a>
</div>
<div>
  <a href="#" onclick="javascript:_paq.push(['trackEvent', 'Menu', 'Freedom']);" title="this will be tracked">Async Tracked Link</a>
</div>
<script type="text/javascript">
   var _paq = _paq || [];
  _paq.push(['trackPageView']);
  _paq.push(['enableLinkTracking']);
  (function() {
    var u="http://mydomain.mybluemix.net/";
    _paq.push(['setTrackerUrl', u+'tracker']);
    _paq.push(['setSiteId', "mysite"]);
    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
    g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
  })();
</script>
</body>
</html>

The main script tag loads the piwik.js JavaScript from the server and records a page-tracking event. It also ensures that any link clicks are tracked too (enableLinkTracking). The example above also shows how asynchronous actions can be recorded by calling _paq.push when the event occurs.

The only things you need to alter from this code snippet is the URL assigned to variable u which should be the URL of your installation and the value passed to setSiteId.

Running with QUEUE_TYPE=stdout

To stream the events to stdout is the default behaviour of the Simple Logging Service. Simply run the app and events will appear on the terminal.

Running with Redis

Read the Getting Started with Redis on Compose.io. Spin up a new Redis cluster on Compose.io and use the credentials you get to feed into your Bluemix Redis by Compose service

Running with QUEUE_TYPE=redis_queue

Define your environment variable and run the process

> export QUEUE_TYPE=redis_queue
> node server.js
Queue mode: redis_queue
Connecting to Redis server on localhost:6379
CDS Labs Simple Logging Service started on port 8081 : Thu Nov 26 2015 16:32:15 GMT+0000 (GMT)

After generating some data in your web application, you can use the Redis command-line interface to check the collected data. The LLEN command can tell you how many items have accumlated on the queue:

> redis-cli
127.0.0.1:6379> LLEN mcqueue
(integer) 26

while the RPOP command will retrieve the oldest item on the queue:

> redis-cli
127.0.0.1:6379> RPOP mcqueue
"{\"action_name\":\"\",\"idsite\":\"mysite\",\"rec\":1,\"r\":176450,\"h\":16,\"m\":28,\"s\":14,\"url\":\"http://localhost:8000/metrics.html#\",\"$_id\":\"772aa0d070215d3b\",\"$_idts\":1448553217,\"$_idvc\":1,\"$_idn\":0,\"$_refts\":0,\"$_viewts\":1448553217,\"cs\":\"windows-1252\",\"send_image\":0,\"pdf\":1,\"qt\":0,\"realp\":0,\"wma\":0,\"dir\":0,\"fla\":1,\"java\":1,\"gears\":0,\"ag\":0,\"cookie\":1,\"res\":\"1440x900\",\"gt_ms\":7,\"type\":\"pageView\",\"ip\":\"::1\"}"

Note: if you have supplied a QUEUE_NAME environment variable, then use that value rather than 'mcqueue' in the above examples.

Running with QUEUE_TYPE=redis_pubsub

Define your environment variable and run the process.

> export QUEUE_TYPE=redis_pubsub
> node server.js
Queue mode: redis_pubsub
Connecting to Redis server on localhost:6379
CDS Labs Simple Logigng Service started on port 8081 : Thu Nov 26 2015 16:32:15 GMT+0000 (GMT)

Using the Redis command-line interface, you can subscribe to the pubsub channel (mcpubsub or the value of QUEUE_NAME you supplied):

> redis-cli
127.0.0.1:6379> SUBSCRIBE mcpubsub
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "mcpubsub"
3) (integer) 1

As you generate data in the application, you see it appear in your redis-cli terminal:

1) "message"
2) "mcpubsub"
3) "{\"action_name\":\"\",\"idsite\":\"mysite\",\"rec\":1,\"r\":578292,\"h\":16,\"m\":35,\"s\":44,\"url\":\"http://localhost:8000/metrics.html#\",\"$_id\":\"772aa0d070215d3b\",\"$_idts\":1448553217,\"$_idvc\":1,\"$_idn\":0,\"$_refts\":0,\"$_viewts\":1448553217,\"cs\":\"windows-1252\",\"send_image\":0,\"pdf\":1,\"qt\":0,\"realp\":0,\"wma\":0,\"dir\":0,\"fla\":1,\"java\":1,\"gears\":0,\"ag\":0,\"cookie\":1,\"res\":\"1440x900\",\"gt_ms\":13,\"type\":\"pageView\",\"ip\":\"::1\"}"

Running with RabbitMQ

Read the Getting Started with RabbitMQ on Compose.io. You need to create a RabbitMQ cluster and create a user with .* access, as described in that document. As the Compose.io RabbitMQ service is very new, and there isn't a Bluemix service for it yet, you need to define the URL of your RabbitMQ service as a custom environment variable RABBITMQ_URL in Bluemix or in the local environment:

export RABBITMQ_URL=amqps://myrabbbituser:mybunnyrabbit99@aws-us-east-1-portal.8.dblayer.com:10705/amazing-rabbitmq-72
```sh

or

```sh
export RABBITMQ_URL=amqp://localhost

Running with QUEUE_TYPE=rabbit_queue

Define your environment variable and run the process

> export QUEUE_TYPE=rabbit_queue
> node server.js
Queue mode: rabbit_queue
Connecting to Rabbit MQ server on amqps:*****@aws-us-east-1-portal.8.dblayer.com:10705/dazzling-rabbitmq-72
CDS Labs Simple Logging Service started on port 8081 : Fri Nov 27 2015 14:04:35 GMT+0000 (GMT)
Connected to RabbitMQ queue 'mcqueue'

After generating some data, in your web application, you should be able to use Compose.io's RabbitMQ Admin page to see the data coming in:

RabbitMQ queue drilldown

Running with QUEUE_TYPE=rabbit_pubsub

Define your environment variable and run the process.

> export QUEUE_TYPE=rabbit_pubsub
> node server.js
Queue mode: rabbit_pubsub
Connecting to Rabbit MQ server on amqps:*****@aws-us-east-1-portal.8.dblayer.com:10705/dazzling-rabbitmq-72
CCDS Labs Simple Logging Service started on port 8081 : Fri Nov 27 2015 15:08:53 GMT+0000 (GMT)
Connected to RabbitMQ pubsub channel 'mcpubsub'

After generating some data in your web application, you should be able to use Compose.io's RabbitMQ Admin page to see the data coming in:

RabbitMQ queue drilldown

Running with IBM Message Hub (Apache Kafka)

Create an Message Hub instance in Bluemix. Bluemix will create the necessary environment variables

Define your environment variable and run the process.

> export QUEUE_TYPE=kafka
> node server.js
Queue mode: kafka
Connecting to Kafka MQ server
CDS Labs Simple Logging Service started on port 8081 : Fri Nov 27 2015 15:57:31 GMT+0000 (GMT)
Created topic 'mcqueue'

Web output

There is a realtime web output of each item logged at /output. Load this page and wait for events to be displayed at the front end. This page does not load any historical data.

Conclusion

The Simple Logging Service is a Bluemix app that collects web metrics. Instead of storing the metrics directly in a database, it writes the data to a choice of queues (Redis, RabbitMQ and Apache Kafka). You can run this app on many instances to share the data collection load and couple it with other microservices that consume and analyse the data. It could serve as the basis of a high-volume metrics collection service.


© "Apache", "CouchDB", "Apache CouchDB" and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

About

A lightweight web-tracking app to record user actions on a single-page webapp. Uses Piwik® web analytics library to collect information and Node.js to store data in Redis, RabbitMQ or Kafka

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published