Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result returned by is_valid_uri() #30

Open
win911 opened this issue Feb 13, 2018 · 10 comments
Open

Incorrect result returned by is_valid_uri() #30

win911 opened this issue Feb 13, 2018 · 10 comments

Comments

@win911
Copy link

win911 commented Feb 13, 2018

According to RFC 3986, there are some ABNF rules:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / ".")

And I tried:

>>> from rfc3986 import is_valid_uri
>>> is_valid_uri("1")
True
>>> is_valid_uri("//")
True
>>> is_valid_uri("http#:/")
True

I think these results are not correct.

@sethmlarson
Copy link
Member

Really this method should be is_valid_uri_reference(), but each of these are valid URI references.

# A no-scheme path with zero extra segments
>>> rfc3986.uri_reference('1')
URIReference(scheme=None, authority=None, path='1', query=None, fragment=None)

# A relative-ref with an empty path and no scheme.
>>> rfc3986.uri_reference('//')
URIReference(scheme=None, authority=None, path=None, query=None, fragment=None)

# A relative-ref with a non-empty path and a fragment.
>>> rfc3986.uri_reference('http#:/')
URIReference(scheme=None, authority=None, path='http', query=None, fragment=':/')

@sigmavirus24
Copy link
Collaborator

Right. There's a far better and more comprehensive way of doing validations

@sigmavirus24
Copy link
Collaborator

@bugeats
Copy link

bugeats commented Feb 7, 2019

rfc3986.is_valid_uri("Hi Super Nintendo Chalmers") # -> True
rfc3986.is_valid_uri("My cat's breath smells like cat food.") # -> True
rfc3986.is_valid_uri("I'm learnding!") # -> True

Something is deeply wrong here.

@sethmlarson
Copy link
Member

sethmlarson commented Feb 8, 2019

@bugeats All of those examples pass the validation check because URIReference.from_string() does normalization which converts characters within components that can be percent-encoded into their percent-encoded counterparts before calling is_valid() (Meaning all of that text ends up being a percent encoded path). And because there aren't any kwargs to "require" any component in particular we get a result of True.

Using the Validator interface makes this all more clear.
You can achieve what I believe you want via:

def is_valid_uri_with_scheme_and_host(uri):
    try:
        uri = rfc3986.URIReference.from_string(uri)
        rfc3986.validators.Validator().require_presence_of("scheme", "host").check_validity_of("scheme", "host").validate(uri)
        return True
    except MissingComponentError, InvalidComponentsError:
        return False

Will return False for all your examples but True for https://www.google.com/ and the like.

@bugeats
Copy link

bugeats commented Feb 8, 2019

Yes I eventually figured that out, thanks for the detailed response. I mostly wanted to point out the absurdity of the code as it reads. I humbly suggest that you reconsider the name of that method.

@sethmlarson
Copy link
Member

Or just remove it, it's been deprecated for a few releases.

@hartwork
Copy link

I want to use this library to detect foo bar (with a space) as an invalid URI with regard to RFC 3986. All parts of the API of this library that I have seen so far auto-correct foo bar to foo%20bar for me. How can I disable that behavior and get an exception, instead? What class and functions can I use for strict parsing behavior if is_valid_uri and .is_valid are not strict? Thanks!

@hartwork
Copy link

Anyone?

@kenballus
Copy link
Contributor

Almost a year late to the party here.

If you don't mind using something home-grown, you could try using it.
I went through RFC3986 and converted all the ABNF to regexes.
I left the ABNF in comments above the regexes so you can verify the code for yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants