Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uuid v7 #681

Open
wants to merge 19 commits into
base: rfc4122bis
Choose a base branch
from
Open

uuid v7 #681

wants to merge 19 commits into from

Conversation

pmccarren
Copy link

@pmccarren pmccarren commented Jan 22, 2023

UUID V7

This PR implements UUID V7, closing ref #580

V7 is implemented, tested and documented and ready for review! No changes to the other versions were made.

Relevant RFC documents:

Benchmarks

V7 appears as fast as V4 when not using native crypto.randomUUID generation.

Results:

Starting. Tests take ~1 minute to run ...
uuid.stringify() x 1,450,855 ops/sec ±2.12% (88 runs sampled)
uuid.parse() x 2,350,677 ops/sec ±1.19% (88 runs sampled)
---

uuid.v1() x 3,585,680 ops/sec ±0.86% (92 runs sampled)
uuid.v1() fill existing array x 9,229,031 ops/sec ±0.50% (93 runs sampled)
uuid.v4() x 13,834,376 ops/sec ±3.34% (84 runs sampled)
uuid.v4() fill existing array x 3,833,619 ops/sec ±1.51% (84 runs sampled)
uuid.v4() without native generation x 2,395,572 ops/sec ±1.27% (91 runs sampled)
uuid.v3() x 268,797 ops/sec ±1.32% (86 runs sampled)
uuid.v5() x 287,899 ops/sec ±1.12% (92 runs sampled)
uuid.v7() x 2,764,344 ops/sec ±0.95% (93 runs sampled)
uuid.v7() fill existing array x 2,801,443 ops/sec ±5.48% (84 runs sampled)
uuid.v7() with defined time x 3,223,107 ops/sec ±1.14% (89 runs sampled)
Fastest is uuid.v4()

TODO:

Copy link
Member

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @pmccarren ccarren. This looks promising!

In addition to the inline comments, can you add a unit test alongside https://github.com/uuidjs/uuid/blob/main/test/unit/v1.test.js, with similar assertions. Specifically, I'd like to see similar tests for sort order, uniqueness, and options handling. (And anything else you think may be relevant.)

The sorting will be of particular interest, since producing UUIDS that sort the same whether sorted by timestamp or lexicographically was pretty much the whole point of version 7. 😄

Note: This should not close #580, as the new RFC adds v6 and v8 formats, too, as well as the MAX_UUID constant. v7 is definitely the majority of what's needed, but we'll want to keep 580 open until we fill in the remaining items.

Lastly, the new RFC is still making it's way through the review process. I don't expect substantive changes at this point, so this is an appropriate time to be adding this. But as I noted in 580, I think it's prudent to go with an experimental version release tag of some sort.

@ctavan: Thoughts on how best to accept this? Should we create a v9-experimental branch that we merge this into rather than master? Then publish experimental releases from that?

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
src/regex.js Outdated Show resolved Hide resolved
src/uuid-bin.js Outdated Show resolved Hide resolved
src/v7.js Outdated Show resolved Hide resolved
@pmccarren
Copy link
Author

@broofa - glad to lend a hand! I appreciate your time reviewing the implementation.

This PR has been updated with your suggested changes, most notably:

  • reworked buffer management so rnds is not mutated
  • added v7 unit test coverage
  • fixed README generation (via npm run docs)

In regards to sorting, I've been considering a few different approaches and while not presently implemented in this PR, I absolutely agree it should be :) I'll work on implementing this as well the associated unit tests.

src/regex.js Outdated Show resolved Hide resolved
src/v7.js Outdated Show resolved Hide resolved
Copy link
Member

@LinusU LinusU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good 👍

Lastly, the new RFC is still making it's way through the review process.

Thoughts on how best to accept this? Should we create a v9-experimental branch that we merge this into rather than master? Then publish experimental releases from that?

Personally think that we should keep this PR open until the spec has been ratified. If we merge into a v9 branch that would stop us from making a v9 release until that has happened, and that might potentially be a long time?

I think that it would be nice to backport the "version 2 is valid" bugfix right away to the current release though?

@ctavan
Copy link
Member

ctavan commented Jan 25, 2023

Thanks an lot for the contribution!

I initially thought this new feature could land in the v9 release, but given we’re broadening the validation regex it’s a breaking change and we’ll have to bump the major version.

I think it would be nice to somehow release it as a prerelease so that people have a chance to try it and provide feedback, but I think our current release and branching process is not yet ready for multiple active major releases. If anyone’s willing to take a look at this I’m happy to support (we were considering better automation for this for a while… #636).

I agree however, that we should not release this in a public major release until the draft has been accepted.

src/regex.js Outdated Show resolved Hide resolved
src/v7.js Outdated Show resolved Hide resolved
@broofa broofa changed the base branch from main to rfc4122bis January 25, 2023 23:46
@broofa
Copy link
Member

broofa commented Jan 25, 2023

Note: I've just created the rfc4122bis branch off main as a place to land this (and set it as the target for this PR). As @ctavan says, we haven't worked through the deploy & publish pipeline for non-main branches yet, so we'll have to figure that out.

Wish I could give you an ETA for that, but it's not been a priority for either of us. As he says... "Help Wanted".

Co-authored-by: Linus Unnebäck <linus@folkdatorn.se>
@broofa broofa added the bis Issues related to RFC4122bis specification label Feb 4, 2023
Copy link
Member

@broofa broofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctavan @LinusU Any other comments? If not, I'd like to get this merged (pending @pmccarren commiting his suggested change for the regex to allow 1-8).

Regarding the plan for the rfc4122bis branch, I've created a project to capture the remaining work, with brief notes on possible implementation details.

Honestly, there's not all that much left... if anyone is feeling inspired. ;-)

Lastly, thoughts on inviting Patrick to join the UUIDJS org? He's done some solid work here that's much appreciated, and seappears to have good street cred. If people could weigh in on that idea (including you, Patrick), we can move forward or not as appropriate. [cc'ing @TrySound here]

@pmccarren
Copy link
Author

@broofa just merged the 1-8 regex update. I'm close to done with implementing monotonic generation - I'd prefer to merge once that's wrapped. Will be buttoned up in a day or two.

As for UUIDJS Org, I'll let my work speak for itself but will add that if you'll have me I'll be an active participant :)

@pmccarren
Copy link
Author

@broofa I just finished adding monotonic support, and it's ready for review! Took a bit of finesse but I'm rather happy with how it turned out. In addition to the monotonicity related unit tests, I manually tested sorting preservation during generation of 100M uuids.

couple of sequence counter implementation notes:

  • when additional generations occur inside the same millisecond, we use a dedicated 31 bits as an incrementing sequence counter. Referred to as Fixed-Length Dedicated Counter Bits (Method 1) in the draft RFC.
  • seq is 31 bits and (re)initialized from the random data pool whenever the clock advances/changes. as the seq is initialized randomly, for nominal usage there is a significant amount (74 bits) of randomness in a given v7 uuid.
  • when the sequence counter rolls over we increment the internal date by 1 millisecond and continue. this accepts the tradeoff of minor clock drift for lexicographical sorting in substantial batch id generation workloads.
  • if the internal date ever exceeds 10 seconds beyond system time, both the date and seq are hard reset.
  • lexicographical sorting is preserved up to (2^31)*10000 generations for a provided millisecond.
  • the seq is stored as 31 bits. the 12 high bits and 19 low bits are stored separately in the uuid to preserve sorting while maintaining a large enough counter size

Pending review, I believe this PR is ready to go now!

@pmccarren pmccarren requested a review from broofa February 9, 2023 23:19
@pmccarren
Copy link
Author

CI failures tracked in #688 (actions/setup-node npm 9)

@robinpokorny
Copy link

Hey, @pmccarren, I had a quick look and have some questions. If you have time, could you look at them?

  1. Should there be more information that the code includes monotonic guarantee beyond the timestamp? While I think it's a good default, it seems fair to highlight it.
  2. Should there be an option to opt-out of that? That is, make the seq random for each call even in the same millisecond. (In this case, I was thinking about updating a v4 UUID with timestamp and version, while potentially reusing native v4 generation.)
  3. Does the fact that we always increase the seq by one, even for the first call within a millisecond, have an impact on the randomness?
  4. Why did we choose 10 second? That seems like a lot.
  5. What about mixed use of user providing msecs or not (as this library is used a lot, it can happen quite easily)? It seems it would break the monotonic guarantee. Is that a concern?

@wiperawa
Copy link

Just wondering when approx. this PR will be merged?

@broofa
Copy link
Member

broofa commented May 18, 2023

@wiperawa No specific ETA. I just need to find the time to review and work through deploy process for an experimental branch.

FWIW, we (CodePen) actually have a need for this as well, so I do have a bit of incentive to make this happen. I just need to find the time to sit down and do a proper review.

(And continued apologies to @pmccarren for dragging our feet on this. This is still important work that we intend to merge.)

@rdrpenguin04
Copy link

One of my teams is also waiting on UUID v7 for their project; we'd also greatly appreciate if this were reviewed, especially since it already has one approval.

@ghost
Copy link

ghost commented Jul 31, 2023

Is this PR ever going to get merged? it's disappointing to see that the author has done their great part since Feb, but this PR is still hanging.

@claytongulick
Copy link

Eagerly awaiting this to be released as well.

Thanks for the great library and work on it!

@DevBrent
Copy link

DevBrent commented Oct 2, 2023

I'll play ChatGPT here and summarize the current status:

  1. UUID org has concerns of the sequential "+ 1" nature impacting randomness of UUIDs generated within the same millisecond. My take on this is that this is a legitimate concern because you can theoretically spam an endpoint to get "seed" UUIDs then offset by +1, +2 to dig up other generations on the same millisecond. Despite it being a legitimate concern, many other libraries implement similar logic including https://github.com/mongodb/js-bson in my experience. uuid v7 #681 (comment) suggests a possible opt-out to enable truly random UUIDs within a given millisecond. I would personally make use of that option despite the lack of guarantee about sequential sort within the same millisecond.
  2. @pmccarren has implemented logic to push a UUID into the next millisecond up to an arbitrary 10 seconds before resetting the date (and I imagine the random seed) in order to guarantee the ability to generate (2^31)*10000 of sequential UUIDs in a given millisecond. This was one of the more significant things holding back this merge I gather because it seems there should be a better way to handle this.
  3. @robinpokorny mentioned a potential failure case when date inputs are provided from databases or for instance date pickers without milliseconds included which would be fairly common for such a widely used library like uuid. I'm not sure what a good solution for this would be.

My take would be UUIDs become less useful if you're going to alter the date input and don't strictly stick to modifying the sequence when sequencing. The primary driving factor of this is that a randomly initiated sequence could initialize 1 count below the max sequence and leaving padding atop the sequence reduces the entropy of the sequence.

This Hacker News thread has more discussion on the ULID spec which UUIDv7 was largely based on: https://news.ycombinator.com/item?id=36447837

The spec as written on that page is confusing on that point, but the incrementing-counter-within-the-same-millisecond-behavior only happens if you explicitly specify a "monotonicFactory", https://github.com/ulid/javascript#monotonic-ulids. The default behavior (just using the ulid() function) doesn't do that, it generates a completely random value regardless of the millisecond value.

If it's random value, then generated UUIDs won't always be ascending which defeats purpose as well.

No, that's not really it either. ULIDs have both a time component (in this case accurate to the millisecond) and a random component. Thus the order is always accurate up to the ms, and for the vast majority of applications anything created in the same millisecond can be considered created "at the same time", so it's usually OK if order is undetermined within the same MS.
Note that the UUID v7 spec is largely modeled after the ULID spec. ULIDs came first, and they traded the "standard" UUID format of 8x-4x-4x-4x-12x hex string for the more compact Crockford base32 format, and there are some other minor differences in number of timestamp bits vs. number of random bits, but they are otherwise functionally equivalent.

This discussion seems to imply other implementations ALWAYS re-seed the random bits and increment the counter.

I'm not really sure if I would prioritize sequential over randomness and perhaps this does need to be a preference. Generally, those seeking UUIDv7 for randomness security through obscurity/incalculability should be using another UUID.

Edit 1: Multi-node behavior should be considered as well as single-node sequence generator node behavior because I believe UUIDv7 in this library could be used in both forms. Each has slightly different preferences. For multi-node, all of this discussion goes out the window and it's best to not modify the msecs at all. For single-node, I think someone who prioritizes sort order over all else may want this unusual behavior.

Edit 2: Within each node of a multi-node cluster, I may wish to prefer sequential sort and it might be acceptable to eat into the MS up to 1-3 milliseconds for my usecases but I still feel uncomfortable. I'd rather OPT IN to sacrifice 10% of my sequence entropy than have this behavior though.

Edit 3: Percona notes UUID_SHORT from MySQL allows the sequence to rollover.

**Upgrading from `uuid@3`?** Your code is probably okay, but check out [Upgrading From `uuid@3`](#upgrading-from-uuid3) for details.
> **Note** Upgrading from `uuid@3`? Your code is probably okay, but check out [Upgrading From `uuid@3`](#upgrading-from-uuid3) for details.

> **Note** Only interested in creating a version 4 UUID? You might be able to use [`cypto.randomUUID()`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID), eliminating the need to install this library.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a small typo here

Suggested change
> **Note** Only interested in creating a version 4 UUID? You might be able to use [`cypto.randomUUID()`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID), eliminating the need to install this library.
> **Note** Only interested in creating a version 4 UUID? You might be able to use [`crypto.randomUUID()`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID), eliminating the need to install this library.

**Upgrading from `uuid@3`?** Your code is probably okay, but check out [Upgrading From `uuid@3`](#upgrading-from-uuid3) for details.
> **Note** Upgrading from `uuid@3`? Your code is probably okay, but check out [Upgrading From `uuid@3`](#upgrading-from-uuid3) for details.

> **Note** Only interested in creating a version 4 UUID? You might be able to use [`cypto.randomUUID()`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID), eliminating the need to install this library.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small typo here

Suggested change
> **Note** Only interested in creating a version 4 UUID? You might be able to use [`cypto.randomUUID()`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID), eliminating the need to install this library.
> **Note** Only interested in creating a version 4 UUID? You might be able to use [`crypto.randomUUID()`](https://developer.mozilla.org/en-US/docs/Web/API/Crypto/randomUUID), eliminating the need to install this library.

@mbrimmer83
Copy link

Rumor has it uuid v7 is now part of the proposed standard What would it take to get this over the finish line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bis Issues related to RFC4122bis specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet