Skip to content

barrel-db/uid

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k-ordered unique identity

The library develops an identity schema suitable for rough, lexicographical objects ordering in distributed systems.

Build Status Coverage Status Hex.pm

Inspiration

The library provide interface to generate k-ordered unique identifiers in lock-free and decentralized manner for Erlang application.

An array A[1..n] is said to be k-ordered if A[i − k] ≤ A[i] ≤ A[i + k] for all i such that k < i ≤ n−k.

The library aims following objectives:

  • use 64-bit for storage friendly identity schema. It allows to reduce indexes footprints and optimize lookup latency.

  • generated values are suitable for partial event ordering in distributed environment and helps on detection of causality violation.

  • values are roughly sortable by time of allocation.

  • the concept is similar to Twitter's snowflake identity.

Background

The library has developed a dual schema: 64-bit unique identity to be used within Erlang virtual machine and 96-bit unique identity for distributed environments. Client applications benefits from short 64-bit identity using it locally and dynamically switches schema while communicating with remote peers.

The k-ordered value consists of time-stamp with millisecond resolution (50-bit) that is used to roughly sort events. The time-stamp ensures distinct sorting within virtual machine where system clock is controlled by operation system. A locally monotonic padding (14-bits) prevents the collisions (Note: 14-bits allows to have about 16K allocations per millisecond).

The time-stamp based sorting fails on distributed systems unless it implements precises time synchronization protocol. Therefore, it was decided to uses location aspect (node fingerprint) as component of k-ordered value. This component allows to keep ordering consistent even if clocks on other node is skewed. The developed schema allows to control the quality of the ordering (precision) depending on the length of neighborhood interval. The neighborhood interval is a time span where the location has higher priority then time. The usage of this time interval relaxes a clock synchronization requirements while preserving an ordering.

The library represents k-ordered values as tuples but implements binary and url friendly encoding similar to base64. The serialization protocol has been changed after library version 1.2.0. New serialization improves allocation performance of k-order values. It uses Erlang OTP/18 feature erlang:unique_integer(...) to generate locally monotonic value. The library allocates about 13M k-ordered values per second on reference hardware.

64-bit, local

        20 bit         20 bit       10 bit        14 bit
   |--------------|--------------|----------|-----------------|
          A              B             C           Seq
   
   A, B, C - time stamp in micro seconds (casted to 50 bits)
   Seq     - sequence value generated by erlang:unique_integer([monotonic])

   -type l() :: {uid, t(), seq()}.
  • 50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp: A mega seconds 10⁶, B seconds, C micro seconds 10⁻⁶.

  • Seq is 14-bit monotonic sequence identifier obtained from erlang:unique_integer(...).

96-bit, global

        20 bit        20 - I bit      32 bit       I bit      10 bit       14 bit
   |--------------|--------------|--------------|----------|----------|-----------------|
          A              B0            Node          B1         C           Seq

   A, B, C - time stamp in microseconds (casted to 50 bits)
   Seq     - sequence value generated by erlang:unique_integer([monotonic])
   Node    - node fingerprint erlang:phash(erlang:node(), 1 bsl 32).
   I       - time neighborhood interval 

   -type g() :: {uid, id(), t(), seq()}.
  • 50-bit time-stamp with millisecond resolution. The library uses Erlang native representation of time-stamp: A mega seconds 10⁶, B seconds, C micro seconds 10⁻⁶.

  • Seq is 14-bit monotonic sequence identifier obtained from erlang:unique_integer(...).

  • Node 32-bit (optionally 64-bit) represent unique node identifier allocated randomly to each node (derived as hash from long node name). Node id has higher sorting priority then seconds fraction of time stamp. This allows to build reliable sorting without time synchronization.

  • I is length of neighborhood interval that is configurable by application according to its needs. Its maximum value 10 bits that roughly corresponds to 15 minutes (default one 8 bits - 5 minutes).

Getting Started

The latest version of the library is available at its master branch. All development, including new features and bug fixes, take place on the master branch using forking and pull requests as described in contribution guidelines.

The stable library release is available via hex packages, add the library as dependency to rebar.config

{deps, [{uid}]}.

Libraries implements a simple api.

%% 
%% generate local identity
%%   {uid,{1539,625275,432128},1}
A = uid:l().

%%
%% convert local identity to global one
%%   {uid,'uid@127.0.0.1',{1539,625275,432128},1}
B = uid:g(A).

%%
%% encode value to binary
%%   <<0,96,57,138,3,235,39,50,123,105,128,1>>
uid:encode(B).

%%
%% encode value to base64
%%   <<".5.tXVEf8n8vPN.0">>
uid:encode64(B).

How To Contribute

The library is Apache 2.0 licensed and accepts contributions via GitHub pull requests:

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

The build process requires Erlang/OTP version 19.0 or later and essential build tools.

Build and run service in your development console. The following command boots Erlang virtual machine and opens Erlang shell.

git clone https://github.com/fogfish/uid
cd uid
make
make run

commit message

The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter Contributing to a Project of Git book.

Short (50 chars or less) summary of changes

More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together.

Further paragraphs come after blank lines.

Bullet points are okay, too

Typically a hyphen or asterisk is used for the bullet, preceded by a single space, with blank lines in between, but conventions vary here

bugs

If you experience any issues with the library, please let us know via GitHub issues. We appreciate detailed and accurate reports that help us to identity and replicate the issue.

  • Specify the configuration of your environment. Include which operating system you use and the versions of runtime environments.

  • Attach logs, screenshots and exceptions, in possible.

  • Reveal the steps you took to reproduce the problem, include code snippet or links to your project.

License

See LICENSE

About

erlang fault tolerant service to generate unique identities

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Erlang 68.2%
  • Makefile 31.8%