Skip to content

jadrefke/HyperLogLog.net

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

HyperLogLog.net

HyperLogLog.net is a C# implementation of HyperLogLog - a near-optimal cardinality estimation algorithm.

HLL was conceived of by Flajolet et. al. in the phenomenal paper HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm.

A visual demonstration of HyperLogLog is available here along with a basic explanation about it.

Contact

You can contact me on twitter @saguiitay.

NuGet

HyperLogLog.net is available as a NuGet package

Features

  • Support custom HashAlgorithms. (Default is MD5, but I strongly recommend using Murmur3 from murmurhash
  • Support custom byte[] converters. (Default handles value types, and calls ToString() on all other types)
  • Support serialization into/from byte[] for network transfer
  • Allows merging multiple HyperLogLog objects into a single HyperLogLog. Useful if you are handling a high volume of data on multiple machines, and needs to aggregate the data for a "final" number.

Release Notes

  • 1.0.0 Initial release.

Usage

var hashAlgorithm = Murmur.MurmurHash.Create128(managed: false); // Usign a custom HashAlgorithm. Default is MD5
var hyperLogLogSerializer = new HyperLogLogSerializer(); // Easy serialization/deserialization of the counter
const int count = 1000000;

// Check %error at different accuracy level (more accuracy requires more memory)
for (int b = 4; b <= 16; b++)
{
  var hll = new HyperLogLog.net.HyperLogLog(hashAlgorithm, b);

  for (int i = 1; i < count + 1; i++)
  {
    // Log appearances of "random" data stream 
    hll.LogData(Guid.NewGuid());
  }

  // Serialize the HyperLogLog to a byte[] (useful for sending over the network)
  var outputStream = new MemoryStream();
  hyperLogLogSerializer.SerializeTo(hll, outputStream);

  // Deserialize back to a HyperLogLog
  var inputStream = new MemoryStream(outputStream.ToArray());
  var hllDeserialized = hyperLogLogSerializer.DeserializeTo(inputStream, hashAlgorithm);

  // Ensure serialization/deserialization didn't ruin our data
  var cardinality = hll.GetCount();
  var cardinalityDeserialized = hllDeserialized.GetCount();

  if (cardinality != cardinalityDeserialized)
    throw new Exception("Serialization/Deserialization ruined the data!");

  // Display Actual data seeing, Recorded (Estimated) data, and error %
  var error = 1 - (cardinality/(double) (3 * count));
  Console.WriteLine("B: {0}\tActual: {1}\tEstimated: {2}\t%Error: {3}", b, count, cardinality, error);
}

About

HyperLogLog.net is a C# implementation of HyperLogLog - a near-optimal cardinality estimation algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 50.5%
  • PowerShell 49.5%