Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V7 and project status #106

Closed
bgadrian opened this issue Dec 21, 2022 · 9 comments
Closed

V7 and project status #106

bgadrian opened this issue Dec 21, 2022 · 9 comments

Comments

@bgadrian
Copy link
Contributor

Hello,

What is the status of this repo? I see some PRs unmerged 1y old.
I am interested to update or write a new implementation for V6-7 UUIDs (v4 draft including the distributed requirements) and wondering if I can contribute to this project or create another one.

@cameracker
Copy link
Contributor

Hi @bgadrian , the PRs that are open are either controversial or are waiting for the correct reviewers. We've been merging the v6 and v7 updates we've received from contributors.

@cameracker
Copy link
Contributor

Closing issue because it's a more of a question

@dylan-bourque
Copy link
Member

The repo is definitely active but the functionality is stable so there's not a lot of activity. There have been 2 PRs with implementations of the drafts for V6 and V7 UUIDs, #93 and #99. If there's been an update since those then a PR with relevant changes would be welcome.

@bgadrian
Copy link
Contributor Author

Thanks for the quick answer.

I think the V7 especially from #99 does not fully implement even the v3 of the draft, more specifically

Additionally, care MUST be taken to ensure UUIDs generated in batches are also monotonic
https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-03.html#monotonicity_counters

Meaning that only relying on the nanosecond from time may not be enough. Especially on commodity hardware in clouds, the nanoseconds increment are not guaranteed, and generating multiple UUIDs in the same MS could lead to collisions.

In Draft v4 there are more options defined in how to counteract this https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#monotonicity_counters and/or https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#distributed_shared_knowledge

I am afraid that trying to implement these features will break the current UUID structure simplicity, as it would require some sort of internal in-memory state (counter) and/or new parameters (nodeID), so I would want to get some opinions before opening a PR on one approach, and define a design agreed by the current project maintainers.

One valid way would be to keep it simple and say that this implementation will not support "Distributed Nodes" and only add support for single node (Monotonicity and Counters), or not ...

@dylan-bourque
Copy link
Member

Meaning that only relying on the nanosecond from time may not be enough. Especially on commodity hardware in clouds, the nanoseconds increment are not guaranteed, and generating multiple UUIDs in the same MS could lead to collisions.

Those are fair concerns @bgadrian. I'd counter, though, with the fact that there is no API in this library for generating batches of UUIDs. Each call to NewVN() generates and returns a single value. And nanoseconds is the lowest granularity we can get in Go so I'm not sure what other options there are.

this implementation will not support "Distributed Nodes"

I'd definitely argue for this path. There's definitely nothing in this library today related to synchronized generation across multiple machines.

I am afraid that trying to implement these features will break the current UUID structure simplicity

IMO the simplicity is one of the advantages of this library so I would be very hesitant to start down a path that would lead to breaking that.

For what it's worth, I've used this library to generate sometimes millions of UUIDs per second and have never run into a collision.

@bgadrian
Copy link
Contributor Author

I don't think batches are supported per se in any implementation, but rather generating a large amount of UUIDs in a small amount of time can be considered a batch (example an insert baching algorithm that shuffles the inserts each X seconds).

One simple implementation for monotonic UUIDs would be to have an internal counter that resets at each ms (lowest V7 granularity) and increments at each generated UUID. From the client point of view/the interface will remain the same (simple).
I think this is explained here https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#section-6.2

I'd definitely argue for this path. There's definitely nothing in this library today related to synchronized generation across multiple machines.

Actually there is, the V3 HardwareAddress NewGenWithHWAF( that can be injected at build serve also this purpose, to avoid collision across multiple nodes. This scenario is also decribed in V7

Node IDs:
With this method, a pseudo-random Node ID value is placed within the UUID layout. This identifier helps ensure the bit-space for a given node is unique, resulting in UUIDs that do not conflict with any other UUID created by another node with a different node id.

I think we can easily implement this by generating a random UUID at construct (or singleton per process) and inject it in each generated UUID V7.

These 2 simple additions can guarantee the monotonic and local uniqueness (and avoid collisions) properties while keeping the interface simple.

@dylan-bourque
Copy link
Member

dylan-bourque commented Dec 22, 2022

Actually there is, the V3 HardwareAddress NewGenWithHWAF( that can be injected at build serve also this purpose

While that's theoretically possible, injecting a function that does RPCs to other machines to do synchronization definitely goes against the spirit of that API.

Again, though, we'll definitely review a PR to close any gaps in V6 or V7 UUIDs.

@bgadrian
Copy link
Contributor Author

I am not aware of any RPC or cross process communication and I was not suggesting one. I was only referring to the existing code NewGenWithHWAF https://github.com/gofrs/uuid/blob/master/generator.go#L149 . MacAddress exists in those UUIDs to eliminate the collisions between processes. Same principle was applied in new UUIDs but instead of Mac address (which is a security issue) suggest to generate a random set of bytes for each process.
Sure, thanks! I'll try and play around

@convto
Copy link
Contributor

convto commented Dec 27, 2022

I so glad to see this discussion!

This was also a point I was difficult to decide on a policy for implementation when I submitted PR #99.
(I intentionally did not implement and discuss related to batch generation at that time in order to narrow the scope of the PR)
I think there is a certain value in sharing here what I considered at the time.

First, the number of random bits in UUIDv7 without awareness of monotonic counter is 74 bits.

Next, when the number of random bits is N and the collision probability P=0.5, the number of trials required can be calculated as 2^(N/2). I have often seen this value used in discussions of UUIDs and other unit time generation capabilities.
(For background on this calculation, the following article is helpful: https://auth0.com/blog/birthday-attacks-collisions-and-password-strength/ )

From these, UUIDv7, which is unaware of the monotonic counter, has a 74-bit random, so its generation capacity can be calculated to be 2^37 when P=0.5, that is 137.4 billion (Incidentally, the generation capacity of UUIDv4 at P=0.5 is 2^61, that is 2.3 quintillion).

I think that a generation capacity of 137.4 billion in msec is extremely unlikely to cause a collision in almost all use cases (even for batch applications).

In the Rev04 draft, the specification for monotonic counters has been changed to should
https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#name-monotonicity-and-counters

I didn't implement it at the time for these reasons, but it's important for library users to have a lot of options, so if you could implement it, that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants