Parsing "Microsoft encoding" is very lenient #60

Carrotman42 · 2020-05-15T20:14:22Z

I was reading through the code for Parse and I noticed that there is no validation on the first and last characters of the uuid-to-parse when the length of the input is 38 characters:

uuid/uuid.go

Lines 38 to 61 in 16ca3ea

    
           // Parse decodes s into a UUID or returns an error.  Both the standard UUID 
        
           // forms of xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx and 
        
           // urn:uuid:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx are decoded as well as the 
        
           // Microsoft encoding {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx} and the raw hex 
        
           // encoding: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. 
        
           func Parse(s string) (UUID, error) { 
        
           	var uuid UUID 
        
           	switch len(s) { 
        
           	// xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 
        
           	case 36: 
        
           	// urn:uuid:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 
        
           	case 36 + 9: 
        
           		if strings.ToLower(s[:9]) != "urn:uuid:" { 
        
           			return uuid, fmt.Errorf("invalid urn prefix: %q", s[:9]) 
        
           		} 
        
           		s = s[9:] 
        
           	// {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx} 
        
           	case 36 + 2: 
        
           		s = s[1:] 
        
           	// xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
        
           	case 32:

I would expect there to be a check that s[0] == '{' && s[37] == '}' rather than simply ignoring those two characters. I am happy to send a PR if requested, it's a simple change. I am just verifying that this loose behavior is not actually desired.

(fwiw, it means that parsing something like a01234567-abcd-cdef-abcd-012345678901a would be parsed without an error, even though it has extra characters at the beginning and end. That's quite unexpected IMO.)

The text was updated successfully, but these errors were encountered:

pborman · 2020-05-19T14:16:08Z

Thank you for pointing this out. Yes, this is a bug and should be fixed. Would be happy to have you send in a PR for this.

It was not intentional to simply ignore these characters, and examples/docs/tests all indicate that curly braces were the intended characters. See google#60

It was not intentional to simply ignore these characters, and examples/docs/tests all indicate that curly braces were the intended characters. I've added some test cases to ensure things work. Fixes google#60

sazzer · 2020-11-13T09:25:20Z

I've just come across this by virtue of the fact that:

"921ef402-cd4c-4be6-8483-a00566a43f60" -> Correctly succeeds
" 921ef402-cd4c-4be6-8483-a00566a43f60" -> Correctly fails to parse
"921ef402-cd4c-4be6-8483-a00566a43f60 " -> Correctly fails to parse
" 921ef402-cd4c-4be6-8483-a00566a43f60 " -> Unexpectedly succeeds!

Turns out it's not that it ignores whitespace if present at both ends, but it's just another version of this bug.

Carrotman42 linked a pull request May 19, 2020 that will close this issue

When parsing a 38-char UUID, require '{' and '}' characters #61

Open

kennytm mentioned this issue Dec 1, 2021

expression, parser: add built-in func is_uuid pingcap/tidb#30318

Merged

12 tasks

This was referenced Dec 1, 2021

inconsistent result for built in function uuid_to_bin pingcap/tidb#30324

Closed

inconsistent result for built in function uuid_to_bin pingcap/tidb#30325

Closed

bormanp mentioned this issue Oct 3, 2023

uuid.Parse allows invalid UUID's #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing "Microsoft encoding" is very lenient #60

Parsing "Microsoft encoding" is very lenient #60

Carrotman42 commented May 15, 2020

pborman commented May 19, 2020

sazzer commented Nov 13, 2020

Parsing "Microsoft encoding" is very lenient #60

Parsing "Microsoft encoding" is very lenient #60

Comments

Carrotman42 commented May 15, 2020

pborman commented May 19, 2020

sazzer commented Nov 13, 2020