Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When parsing a 38-char UUID, require '{' and '}' characters #61

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Carrotman42
Copy link

It was not intentional to simply ignore these characters, and
examples/docs/tests all indicate that curly braces were the intended
characters. I've added some test cases to ensure things work.

Fixes #60

It was not intentional to simply ignore these characters, and
examples/docs/tests all indicate that curly braces were the intended
characters. I've added some test cases to ensure things work.

Fixes google#60
@Carrotman42
Copy link
Author

Weirdly I can't seem to add reviewers/assignees. Not sure why, maybe an ACL thing. @pborman ptal

@pborman
Copy link
Collaborator

pborman commented May 19, 2020

Thanks for the fix.

This is the right thing to do but is potentially a breaking change. It used to handle a UUID in quotes but will no longer. It wasn't intended for it to be able to do that but I can easily imagine someone is depending on that. Perhaps you can change the code to see if the first and last byte are the same and that the first byte is either a '{' or '" (double quote). Does that sound reasonable to you?

@pborman
Copy link
Collaborator

pborman commented May 19, 2020

Wait, that only works for quotes. You do have to check the first and last independently because last I checked '{' != '}' :-)

@Carrotman42
Copy link
Author

I mean we're kinda stuck between a rock and a hard place. Today you can think of " needing to work, tomorrow someone comes along and says they need ' or [ to work. At this point it's a can of worms and we need to decide whether it's better to actually do what the docs have claimed for years, or to maintain a special-case list (both in code and docs).

@Carrotman42
Copy link
Author

If it wasn't clear: it's your call, I'm happy to implement a special case list.

@pborman
Copy link
Collaborator

pborman commented May 19, 2020

I think a double quote is probably the most common I was also thinking of an array for fast checks:

var quotes = [256]byte {
'{': '}',
'"': '"',
}

Then it is really easy to add other pairs, does not complicate the code, and does not increase run time. I agree, there are probably other cases as well. What comes to mind is < [ ( { ' ". Probably the least likely to break anything. I suppose ` could be in there too. I think any other character might be harder to justify.

@ylz-at
Copy link

ylz-at commented Jan 5, 2021

Any update?

@pborman
Copy link
Collaborator

pborman commented Jan 8, 2021

I left this with the comment that this will break the parsing of UUIDs in quotes, which I think is a reasonable possibility. Any solution will slow down the parsing of 38 byte UUIDs, though I am more concerned with breaking existing code.

I coded up a more flexible solution:

var match = [256]byte {
        '{': '}',
        '[': ']',
        '(': ')',
        '<': '>',
        '"': '"',
        '\'': '\'',
        '`': '`',
}

                if m := match[s[0]]; m == 0 || m != s[37] {
                        return uuid, fmt.Errorf("invalid prefix/suffix %c %c", s[0], s[37])
                }

This adds three loads (one being a table lookup) and two comparisons to every 38 byte UUID.

@beiriannydd
Copy link

I think this is probably a slippery slope, since the unicode quotes might legitimately be requested to be added and all of a sudden you're an array of [2^32]rune. Better to stick to what is documented to work which means folk will have to account for anything outside that when they recompile against the new library. Bump the version as it does change the behaviour. Right now the behaviour is buggy and addressing the bug is expected to break non-compliant code which has relied on the bug, no? Note the reason for the new version and you have consistent (buggy) behaviour in the same major version and fixed behaviour in the new version. Or document the current behaviour as correct (which seems relatively benign) and then everyone will at least expect the behaviour.

@pborman
Copy link
Collaborator

pborman commented Mar 31, 2021

@beiriannydd I think you are right (though unicode won't matter because those will be more than 1 byte long). Probably the better answer for those that want to use this function to validate is to create a new function that validates and you can pass in the starting and ending string. It would strip the delimiters and then expect 36 bytes left over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing "Microsoft encoding" is very lenient
4 participants