-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RFC 3986 URL parsing #1487
Conversation
I think there are some open bugs that should be fixed before we move over to this wholesale. |
I can work on some of those bugs now, is there a list of bugs you'd like squashed before you're comfortable with this merge? |
python-hyper/rfc3986#32 is the main one I think |
This is probably a nice to have as well: python-hyper/rfc3986#36 |
I'll take a look at these issues, thanks! |
And if we ever use parsed URIs as dictionary keys, we'll likely need: python-hyper/rfc3986#35 (based on the description) |
I might pick that one off while I have my environment open anyways. ;) |
Thanks for reviewing that change @sigmavirus24. Pending python-hyper/rfc3986#37 being merged if we assume that python-hyper/rfc3986#32 is a non-issue until @kennethreitz gets back to us is there anything else that needs to be done for a new release of rfc3986 being tagged? |
I think we need release notes and a version bump. Once merged I can run the automation for it. |
Once updated to 1.2.0 I'm 👍 on this |
Codecov Report
@@ Coverage Diff @@
## master #1487 +/- ##
==========================================
- Coverage 69.49% 66.57% -2.93%
==========================================
Files 22 22
Lines 2685 2776 +91
==========================================
- Hits 1866 1848 -18
- Misses 819 928 +109
Continue to review full report at Codecov.
|
8478e96
to
633a0c2
Compare
3463c9c
to
923bf6b
Compare
abnf_regexp.IPv4_RE, | ||
abnf_regexp.IPv6_RE, | ||
abnf_regexp.IPv6_ADDRZ_RE, | ||
abnf_regexp.IPv_FUTURE_RE, | ||
abnf_regexp.IP_LITERAL_RE | ||
abnf_regexp.IPv_FUTURE_RE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if rfc3986 could just provide a regular expression for use here. 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, something like misc.IP_ADDRESS_MATCHER
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that rfc3986 will ever need to use the matcher so I'd prefer not to add more time compiling a regexp we're not going to use. We could just have them all in one RegExp though in abnf_regexp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I can create that issue to track the idea.
923bf6b
to
f245a19
Compare
Current issue is specifically Windows Python 2.7 having issues with |
It looks like Also according to our documentation IPv6 isn't supported by PySocks in some situations but I might want to revisit that. |
Looks like the unicode addresses with |
Ready for a rereview @sigmavirus24, thanks for catching those artifacts. |
Woo! 🎉 |
Yay this is great!! :) |
For real. Thanks for doing this, @SethMichaelLarson and thank you for reviewing, @sigmavirus24! |
I did almost no hard work whatsoever. 👏 @SethMichaelLarson |
There appears to be normalization of URL's occurring that removes intended functionality through GET/POST requests. Built a scanner that looks for a directory traversal issue to help to troubleshoot missing patches and it strips out the ../ in the request. Even with allow_redirects=False the request still gets stripped. Example: $ python3
Notice the request for ../ is stripped/removed. |
FYI: This broke an integration with PLC software for us, because it changes the URLs send out (via This is the changed behaviour: python3 -m venv venv
venv/bin/pip install urllib3==1.23
venv/bin/python -c 'from urllib3.util import parse_url ; print(parse_url("http://foo.tld/?a=[]"))'
# http://foo.tld/?a=[]
venv/bin/pip install urllib3==1.26.11
venv/bin/python -c 'from urllib3.util import parse_url ; print(parse_url("http://foo.tld/?a=[]"))'
# http://foo.tld/?a=%5B%5D Currently trying to find a workaround. |
This PR adds the
rfc3986
module intourllib3.packages
and replaces our current URL parser with the one fromrfc3986
. This brings our URL parser in-line with Python standard library parsing which all use RFC 3986.As far as I can see the only backwards compatibility we lose here is we can no longer parse
http://@
. I'm hoping breaking this use case is acceptable. :) I've added a few URLs from various sources that other URL parsers had trouble with.Hoping I can get a review on your thoughts @sigmavirus24?
Reference for one of the example URLs (maybe more here?): https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf
Closes #466
Closes #859
Closes #952
Closes #1096