Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated V7 generator to Draft04. #112

Merged
merged 7 commits into from Jan 26, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
59 changes: 45 additions & 14 deletions generator.go
Expand Up @@ -80,9 +80,9 @@ func NewV6() (UUID, error) {
}

// NewV7 returns a k-sortable UUID based on the current millisecond precision
// UNIX epoch and 74 bits of pseudorandom data.
// UNIX epoch and 74 bits of pseudorandom data. It supports single-node batch generation (multiple UUIDs in the same timestamp) with a Monotonic Random counter.
//
// This is implemented based on revision 03 of the Peabody UUID draft, and may
// This is implemented based on revision 04 of the Peabody UUID draft, and may
// be subject to change pending further revisions. Until the final specification
// revision is finished, changes required to implement updates to the spec will
// not be considered a breaking change. They will happen as a minor version
Expand Down Expand Up @@ -158,7 +158,7 @@ func NewGenWithHWAF(hwaf HWAddrFunc) *Gen {
func (g *Gen) NewV1() (UUID, error) {
u := UUID{}

timeNow, clockSeq, err := g.getClockSequence()
timeNow, clockSeq, err := g.getClockSequence(false)
if err != nil {
return Nil, err
}
Expand Down Expand Up @@ -225,7 +225,7 @@ func (g *Gen) NewV6() (UUID, error) {
return Nil, err
}

timeNow, clockSeq, err := g.getClockSequence()
timeNow, clockSeq, err := g.getClockSequence(false)
if err != nil {
return Nil, err
}
Expand All @@ -241,8 +241,12 @@ func (g *Gen) NewV6() (UUID, error) {
return u, nil
}

// getClockSequence returns the epoch and clock sequence for V1 and V6 UUIDs.
func (g *Gen) getClockSequence() (uint64, uint16, error) {
// getClockSequence returns the epoch and clock sequence for V1,V6 and V7 UUIDs.
//
// When useUnixTS is false, it uses the Coordinated Universal Time (UTC) as a count of 100-
bgadrian marked this conversation as resolved.
Show resolved Hide resolved
//
// nanosecond intervals since 00:00:00.00, 15 October 1582 (the date of Gregorian reform to the Christian calendar).
bgadrian marked this conversation as resolved.
Show resolved Hide resolved
func (g *Gen) getClockSequence(useUnixTSMs bool) (uint64, uint16, error) {
var err error
g.clockSequenceOnce.Do(func() {
buf := make([]byte, 2)
Expand All @@ -258,7 +262,12 @@ func (g *Gen) getClockSequence() (uint64, uint16, error) {
g.storageMutex.Lock()
defer g.storageMutex.Unlock()

timeNow := g.getEpoch()
var timeNow uint64
if useUnixTSMs {
timeNow = uint64(g.epochFunc().UnixMilli())
cameracker marked this conversation as resolved.
Show resolved Hide resolved
} else {
timeNow = g.getEpoch()
}
// Clock didn't change since last UUID generation.
// Should increase clock sequence.
if timeNow <= g.lastTime {
Expand All @@ -272,28 +281,50 @@ func (g *Gen) getClockSequence() (uint64, uint16, error) {
// NewV7 returns a k-sortable UUID based on the current millisecond precision
// UNIX epoch and 74 bits of pseudorandom data.
//
// This is implemented based on revision 03 of the Peabody UUID draft, and may
// This is implemented based on revision 04 of the Peabody UUID draft, and may
// be subject to change pending further revisions. Until the final specification
// revision is finished, changes required to implement updates to the spec will
// not be considered a breaking change. They will happen as a minor version
// releases until the spec is final.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this draft focuses on being more tentative about how strongly the implementations need to respect monotonicity of the increments vs unguessability, do we owe it to users to be explicit about which behavior we're leaning towards in the implementation?

func (g *Gen) NewV7() (UUID, error) {
var u UUID

if _, err := io.ReadFull(g.rand, u[6:]); err != nil {
/* https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#name-uuid-version-7
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms | ver | rand_a |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */

ms, clockSeq, err := g.getClockSequence(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and then just to make sure I understand: this isnt really strictly needed for the PR, it looks like this is just a refactor to move this calculation into the clock sequence rather than just doing it here to meet the ms requirement for this specific uuid. Is that the case? I don't have a strong preference here but I'll say that the boolean flag parameteter on getClockSequence is slightly more mysterious if we're trying to understand "why" that flag exists. It's private so it's not a big deal and I won't to ask for a reshuffle if other maintainers are ok with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to reuse the code sequencer and the mutex, but with a different timestamp, hence the flag.

if err != nil {
return Nil, err
}

tn := g.epochFunc()
ms := uint64(tn.Unix())*1e3 + uint64(tn.Nanosecond())/1e6
u[0] = byte(ms >> 40)
//UUIDv7 features a 48 bit timestamp. First 32bit (4bytes) represents seconds since 1970, followed by 2 bytes for the ms granularity.
u[0] = byte(ms >> 40) //1-6 bytes: big-endian unsigned number of Unix epoch timestamp
u[1] = byte(ms >> 32)
u[2] = byte(ms >> 24)
u[3] = byte(ms >> 16)
u[4] = byte(ms >> 8)
u[5] = byte(ms)

//The 6th byte contains the version and partially rand_a data.
//We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok, we need the least significant that contains the counter to ensure the monotonic property
binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic
Copy link
Contributor

@convto convto Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to make the API user-selectable whether to consider batch generation or not.
Because getClockSequence performs a mutex lock, and using it will result in worse performance and reduced generation capability.
For non-batch generation use cases, it is probably undesirable to have getClockSequence run, so a user-selectable API might be better.

(For example, the implementation related to draft allows breaking changes, so add isBatch to the NewV7() argument.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we moved this line from the top so that we can batch generate the UUID better, yes?

Can we call out in a comment here that this is done here specifically to support batching? I can see someone moving it around and unintentionally breaking that behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameracker I moved that line for an improved readability. It was confusing to me first to fill bytes 8+ first, and then fill the 1-8 bytes. By moving that line specifically after or before the first bytes it would not affect the result, but all the lines after this one needs to be in order because of the overrides.


//override first 4bits of u[6].
u.SetVersion(V7)

//set rand_b 64bits of pseudo-random bits (first 2 will be overridden)
if _, err = io.ReadFull(g.rand, u[8:16]); err != nil {
return Nil, err
}
//override first 2 bits of byte[8] for the variant
u.SetVariant(VariantRFC4122)

return u, nil
Expand Down
110 changes: 57 additions & 53 deletions generator_test.go
Expand Up @@ -449,12 +449,14 @@ func testNewV6KSortable(t *testing.T) {

func testNewV7(t *testing.T) {
t.Run("Basic", makeTestNewV7Basic())
t.Run("TestVector", makeTestNewV7TestVector())
t.Run("Basic10000000", makeTestNewV7Basic10000000())
t.Run("DifferentAcrossCalls", makeTestNewV7DifferentAcrossCalls())
t.Run("StaleEpoch", makeTestNewV7StaleEpoch())
t.Run("FaultyRand", makeTestNewV7FaultyRand())
t.Run("ShortRandomRead", makeTestNewV7ShortRandomRead())
t.Run("KSortable", makeTestNewV7KSortable())
t.Run("ClockSequence", makeTestNewV7ClockSequence())
}

func makeTestNewV7Basic() func(t *testing.T) {
Expand All @@ -472,6 +474,37 @@ func makeTestNewV7Basic() func(t *testing.T) {
}
}

// makeTestNewV7TestVector as defined in Draft04
func makeTestNewV7TestVector() func(t *testing.T) {
return func(t *testing.T) {
pRand := make([]byte, 10)
//first 2 bytes will be read by clockSeq. First 4 bits will be overridden by Version. The next bits should be 0xCC3(3267)
binary.LittleEndian.PutUint16(pRand[:2], uint16(0xCC3))
//8bytes will be read for rand_b. First 2 bits will be overridden by Variant
binary.LittleEndian.PutUint64(pRand[2:], uint64(0x18C4DC0C0C07398F))

g := &Gen{
epochFunc: func() time.Time {
return time.UnixMilli(1645557742000)
},
rand: bytes.NewReader(pRand),
}
u, err := g.NewV7()
if err != nil {
t.Fatal(err)
}
if got, want := u.Version(), V7; got != want {
t.Errorf("got version %d, want %d", got, want)
}
if got, want := u.Variant(), VariantRFC4122; got != want {
t.Errorf("got variant %d, want %d", got, want)
}
if got, want := u.String()[:15], "017f22e2-79b0-7"; got != want {
t.Errorf("got version %q, want %q", got, want)
}
}
}

func makeTestNewV7Basic10000000() func(t *testing.T) {
return func(t *testing.T) {
if testing.Short() {
Expand Down Expand Up @@ -584,61 +617,32 @@ func makeTestNewV7KSortable() func(t *testing.T) {
}
}

func testNewV7ClockSequence(t *testing.T) {
if testing.Short() {
t.Skip("skipping test in short mode.")
}

g := NewGen()

// hack to try and reduce race conditions based on when the test starts
nsec := time.Now().Nanosecond()
sleepDur := int(time.Second) - nsec
time.Sleep(time.Duration(sleepDur))

u1, err := g.NewV7()
if err != nil {
t.Fatalf("failed to generate V7 UUID #1: %v", err)
}

u2, err := g.NewV7()
if err != nil {
t.Fatalf("failed to generate V7 UUID #2: %v", err)
}

time.Sleep(time.Millisecond)

u3, err := g.NewV7()
if err != nil {
t.Fatalf("failed to generate V7 UUID #3: %v", err)
}

time.Sleep(time.Second)

u4, err := g.NewV7()
if err != nil {
t.Fatalf("failed to generate V7 UUID #3: %v", err)
}

s1 := binary.BigEndian.Uint16(u1[6:8]) & 0xfff
s2 := binary.BigEndian.Uint16(u2[6:8]) & 0xfff
s3 := binary.BigEndian.Uint16(u3[6:8]) & 0xfff
s4 := binary.BigEndian.Uint16(u4[6:8]) & 0xfff

if s1 != 0 {
t.Errorf("sequence 1 should be zero, was %d", s1)
}

if s2 != s1+1 {
t.Errorf("sequence 2 expected to be one above sequence 1; seq 1: %d, seq 2: %d", s1, s2)
}
func makeTestNewV7ClockSequence() func(t *testing.T) {
return func(t *testing.T) {
if testing.Short() {
t.Skip("skipping test in short mode.")
}

if s3 != 0 {
t.Errorf("sequence 3 should be zero, was %d", s3)
}
g := NewGen()
//always return the same TS
g.epochFunc = func() time.Time {
return time.UnixMilli(1645557742000)
}
//by being KSortable with the same timestamp, it means the sequence is Not empty, and it is monotonic
uuids := make([]UUID, 10)
for i := range uuids {
u, err := g.NewV7()
testErrCheck(t, "NewV7()", "", err)
uuids[i] = u
}

if s4 != 0 {
t.Errorf("sequence 4 should be zero, was %d", s4)
for i := 1; i < len(uuids); i++ {
p, n := uuids[i-1], uuids[i]
isLess := p.String() < n.String()
if !isLess {
t.Errorf("uuids[%d] (%s) not less than uuids[%d] (%s)", i-1, p, i, n)
}
}
}
}

Expand Down