When parsing a 38-char UUID, require '{' and '}' characters #61

Carrotman42 · 2020-05-19T16:19:18Z

It was not intentional to simply ignore these characters, and
examples/docs/tests all indicate that curly braces were the intended
characters. I've added some test cases to ensure things work.

Fixes #60

It was not intentional to simply ignore these characters, and examples/docs/tests all indicate that curly braces were the intended characters. I've added some test cases to ensure things work. Fixes google#60

Carrotman42 · 2020-05-19T17:49:46Z

Weirdly I can't seem to add reviewers/assignees. Not sure why, maybe an ACL thing. @pborman ptal

pborman · 2020-05-19T18:38:06Z

Thanks for the fix.

This is the right thing to do but is potentially a breaking change. It used to handle a UUID in quotes but will no longer. It wasn't intended for it to be able to do that but I can easily imagine someone is depending on that. Perhaps you can change the code to see if the first and last byte are the same and that the first byte is either a '{' or '" (double quote). Does that sound reasonable to you?

pborman · 2020-05-19T18:41:08Z

Wait, that only works for quotes. You do have to check the first and last independently because last I checked '{' != '}' :-)

Carrotman42 · 2020-05-19T19:33:00Z

I mean we're kinda stuck between a rock and a hard place. Today you can think of " needing to work, tomorrow someone comes along and says they need ' or [ to work. At this point it's a can of worms and we need to decide whether it's better to actually do what the docs have claimed for years, or to maintain a special-case list (both in code and docs).

Carrotman42 · 2020-05-19T19:39:13Z

If it wasn't clear: it's your call, I'm happy to implement a special case list.

pborman · 2020-05-19T20:41:43Z

I think a double quote is probably the most common I was also thinking of an array for fast checks:

var quotes = [256]byte {
'{': '}',
'"': '"',
}

Then it is really easy to add other pairs, does not complicate the code, and does not increase run time. I agree, there are probably other cases as well. What comes to mind is < [ ( { ' ". Probably the least likely to break anything. I suppose ` could be in there too. I think any other character might be harder to justify.

ylz-at · 2021-01-05T05:57:55Z

Any update?

pborman · 2021-01-08T18:19:44Z

I left this with the comment that this will break the parsing of UUIDs in quotes, which I think is a reasonable possibility. Any solution will slow down the parsing of 38 byte UUIDs, though I am more concerned with breaking existing code.

I coded up a more flexible solution:

var match = [256]byte {
        '{': '}',
        '[': ']',
        '(': ')',
        '<': '>',
        '"': '"',
        '\'': '\'',
        '`': '`',
}

                if m := match[s[0]]; m == 0 || m != s[37] {
                        return uuid, fmt.Errorf("invalid prefix/suffix %c %c", s[0], s[37])
                }

This adds three loads (one being a table lookup) and two comparisons to every 38 byte UUID.

beiriannydd · 2021-03-30T23:44:02Z

I think this is probably a slippery slope, since the unicode quotes might legitimately be requested to be added and all of a sudden you're an array of [2^32]rune. Better to stick to what is documented to work which means folk will have to account for anything outside that when they recompile against the new library. Bump the version as it does change the behaviour. Right now the behaviour is buggy and addressing the bug is expected to break non-compliant code which has relied on the bug, no? Note the reason for the new version and you have consistent (buggy) behaviour in the same major version and fixed behaviour in the new version. Or document the current behaviour as correct (which seems relatively benign) and then everyone will at least expect the behaviour.

pborman · 2021-03-31T01:35:37Z

@beiriannydd I think you are right (though unicode won't matter because those will be more than 1 byte long). Probably the better answer for those that want to use this function to validate is to create a new function that validates and you can pass in the starting and ending string. It would strip the delimiters and then expect 36 bytes left over.

When parsing a 38-char UUID, require '{' and '}' characters

6d1eb31

It was not intentional to simply ignore these characters, and examples/docs/tests all indicate that curly braces were the intended characters. I've added some test cases to ensure things work. Fixes google#60

This was referenced Dec 1, 2021

inconsistent result for built in function uuid_to_bin pingcap/tidb#30324

Closed

inconsistent result for built in function uuid_to_bin pingcap/tidb#30325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When parsing a 38-char UUID, require '{' and '}' characters #61

When parsing a 38-char UUID, require '{' and '}' characters #61

Carrotman42 commented May 19, 2020

Carrotman42 commented May 19, 2020

pborman commented May 19, 2020

pborman commented May 19, 2020

Carrotman42 commented May 19, 2020

Carrotman42 commented May 19, 2020

pborman commented May 19, 2020

ylz-at commented Jan 5, 2021

pborman commented Jan 8, 2021

beiriannydd commented Mar 30, 2021

pborman commented Mar 31, 2021

When parsing a 38-char UUID, require '{' and '}' characters #61

Are you sure you want to change the base?

When parsing a 38-char UUID, require '{' and '}' characters #61

Conversation

Carrotman42 commented May 19, 2020

Carrotman42 commented May 19, 2020

pborman commented May 19, 2020

pborman commented May 19, 2020

Carrotman42 commented May 19, 2020

Carrotman42 commented May 19, 2020

pborman commented May 19, 2020

ylz-at commented Jan 5, 2021

pborman commented Jan 8, 2021

beiriannydd commented Mar 30, 2021

pborman commented Mar 31, 2021