Skip to content
Jeffrey Heer edited this page Apr 19, 2021 · 20 revisions

datalib

NOTE: Datalib is no longer being actively maintained. The Arquero library provides similar functionality plus much more. In addition, Vega now includes its own data utilities in the vega-util and vega-statistics packages.

Datalib is a JavaScript data utility library. It provides facilities for data loading, type inference, common statistics, and string formatting. While created to power Vega and related projects, datalib is a standalone library useful for data-driven JavaScript applications on both the client (web browser) and server (e.g., node.js).

For documentation, see the datalib API Reference.

Use

Datalib provides a set of utilities for working with data. These include:

  • Loading and parsing data files (JSON, TopoJSON, CSV, TSV).
  • Summary statistics (mean, deviation, median, correlation, histograms, etc).
  • Group-by aggregation queries, including streaming data support.
  • Data-driven string templates with expressive formatting filters.
  • Utilities for working with JavaScript functions, objects and arrays.

Datalib can be used both server-side and client-side. For use in node.js, simply npm install datalib or include datalib as a dependency in your package.json file. For use on the client, install via bower install datalib or include datalib.min.js in your page.

Example

// Load datalib.
var dl = require('datalib');

// Load and parse a CSV file. Datalib does type inference for you.
// The result is an array of JavaScript objects with named values.
// Parsed dates are stored as UNIX timestamp values.
var data = dl.csv('http://vega.github.io/datalib/data/stocks.csv');

// Show summary statistics for each column of the data table.
console.log(dl.format.summary(data));

// Compute mean and standard deviation by ticker symbol.
var rollup = dl.groupby('symbol')
  .summarize({'price': ['mean', 'stdev']})
  .execute(data);
console.log(dl.print.table(rollup));

// Compute correlation measures between price and date.
console.log(
  dl.cor(data, 'price', 'date'),      // Pearson product-moment correlation
  dl.cor.rank(data, 'price', 'date'), // Spearman rank correlation
  dl.cor.dist(data, 'price', 'date')  // Distance correlation
);

// Compute mutual information distance between years and binned price.
var bin_price = dl.$bin(data, 'price'); // returns binned price values
var year_date = dl.$year('date');       // returns year from date field
var counts = dl.groupby(year_date, bin_price).count().execute(data);
console.log(dl.mutual.dist(counts, 'bin_price', 'year_date', 'count'));