Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scheme check in repo url #602

Merged
merged 12 commits into from May 11, 2020

Conversation

deveshks
Copy link
Contributor

Fixed and closes #597

Copy link
Member

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

urlparse('mailto:user@example.com')

And then continue to alter this until we have a real HTTP(S) URL

@deveshks
Copy link
Contributor Author

Try this:

urlparse('mailto:user@example.com')

Yes, this gives the scheme as mailto

>>> from urllib.parse import urlparse
>>> urlparse('mailto:user@example.com')
ParseResult(scheme='mailto', netloc='', path='user@example.com', params='', query='', fragment='')

And then continue to alter this until we have a real HTTP(S) URL

I am not sure what you meant by "continue to alter this", are you asking me to alter the example, or the code to accept this as a valid repository URL (I assume it's not), or you want to add additional checks of schema.

If we only allow http(s) schema, I can change the first condition at https://github.com/pypa/twine/pull/602/files#diff-547bab308763f89cacec226151fcbb80R105 to

if repository_url and parsed.scheme in ('http', 'https'):

@sigmavirus24
Copy link
Member

There are many URIs that urlparse will parse that are going to not end up being valid here. My point is that for you to discover those different ways in this PR to get this right, you'll probably spend more time than adding a library that will do that for you while allowing expressive explanation of the restrictions we're placing. (Alternatively, we can spend the next probably long period of time trying to nail this down in a bespoke way that will become confusing and hard to reason about for new contributors)

It's not a matter of checking for the scheme being in that tuple (in fact, we would really rather never upload over plain-text).

@deveshks
Copy link
Contributor Author

My point is that for you to discover those different ways in this PR to get this right, you'll probably spend more time than adding a library that will do that for you while allowing expressive explanation of the restrictions we're placing.

That's a very valid point. Also I think you are pointing towards using rfc3986 which you also pointed out in #597 (comment)

I was actually under the assumption that a solution can be achieved by standard python modules, instead of using an external library will be preferred, but if that is not the case, I can surely update my PR to use this library.

@sigmavirus24
Copy link
Member

@deveshks how could I have been clearer in that message that a third-party library was an acceptable solution to the problem?

@deveshks
Copy link
Contributor Author

deveshks commented Apr 22, 2020

how could I have been clearer in that message that a third-party library was an acceptable solution to the problem?

So are you saying that a third party library is not an acceptable solution, or is it an acceptable solution (Sorry I couldn't understand it from this statement)

Also when you said

My point is that for you to discover those different ways in this PR to get this right, you'll probably spend more time than adding a library that will do that for you while allowing expressive explanation of the restrictions we're placing.

What I understood from it that the check I have applied here using the scheme of the URL might not be enough to cover all the allowed cases, so it's not worth spending time on it. Instead add a library (like rfc3986) which does that for you (via rfc3986.validators) and also allows
expressive explanation of the restrictions we're placing (via methods like allow_schemes , require_presence_of and check_validity_of

@deveshks
Copy link
Contributor Author

Hi,

The last feedback I received in this PR wasn't really clear to me, and I asked a follow up comment above to clarify it. Could someone please look at the same and help me bridge the gap in my understanding and move this PR forward?

@deveshks
Copy link
Contributor Author

deveshks commented May 8, 2020

Hi @sigmavirus24,

Could you please let me know what changes need to be done here so that I can move this PR forward 😊

@bhrutledge , I would appreciate your thoughts as well on this.

@sigmavirus24
Copy link
Member

@deveshks A third-party library is an acceptable solution here.

What I understood from it that the check I have applied here using the scheme of the URL might not be enough to cover all the allowed cases, so it's not worth spending time on it.

My point isn't that it isn't worth spending time on doing it here, but that it won't be as complete as a third-party solution and it makes little sense to keep re-implementing the same things without a really good reason not to include that third-party solution.

@deveshks
Copy link
Contributor Author

deveshks commented May 8, 2020

@deveshks A third-party library is an acceptable solution here.

@sigmavirus24 , thank you for your response. Can I then go ahead and use rfc3986 to check the schema and update the PR?

Also to include this library, do I need to update it on pyproject.toml:require, or are there other places I need to add it?

@sigmavirus24
Copy link
Member

@sigmavirus24 , thank you for your response. Can I then go ahead and use rfc3986 to check the schema and update the PR?

Yes

I think you need to update

install_requires=
... I believe that require in pyproject.toml is purely for packaging twine

@deveshks
Copy link
Contributor Author

deveshks commented May 8, 2020

It's not a matter of checking for the scheme being in that tuple (in fact, we would really rather never upload over plain-text).

@sigmavirus24 ,

I also observed that some places in the tests module, especially where we create the URLs for devpi-server we use http:// .

How do we handle these cases, given that we should only allow https schema in the rfc3986 validator as per your above comment? Do we patch the tests for such cases, or do we update the url's to use https:// in all those cases, along with updating the tests to work with https:// ?

Or do we simply allow both http:// and https:// schemas (I assume this is the least desirable approach)

@deveshks deveshks force-pushed the improve-scheme-check branch 3 times, most recently from ce3e0e3 to ba72c0c Compare May 8, 2020 19:01
@deveshks
Copy link
Contributor Author

deveshks commented May 8, 2020

I have made the necessary changes to use rfc3986, and also patched validate_url to mock return https when we use http:// URL's for devpi-server`` and pypi-server` for integration tests. Please take a look.

twine/utils.py Outdated Show resolved Hide resolved
@deveshks deveshks requested a review from sigmavirus24 May 9, 2020 12:52
@deveshks deveshks force-pushed the improve-scheme-check branch 2 times, most recently from a555561 to 96eab35 Compare May 10, 2020 08:33
@deveshks deveshks force-pushed the improve-scheme-check branch 2 times, most recently from ee76b6e to 9dbbd27 Compare May 10, 2020 08:52
twine/utils.py Outdated Show resolved Hide resolved
twine/utils.py Outdated Show resolved Hide resolved
@sigmavirus24
Copy link
Member

Or do we simply allow both http:// and https:// schemas (I assume this is the least desirable approach)

I do not find it desirable for URLs we know support TLS, e.g., pypi.org. That said, some folks could be using twine against internal instances (e.g., DevPI) and they may not want to manage TLS certificates for that. In that case, http would be a valid option, even if I think it's a poor practice.

I think we also have code that looks for PyPI and replaces http with https as appropriate. If we can verify that, we can allow http in the scheme.

@deveshks
Copy link
Contributor Author

I think we also have code that looks for PyPI and replaces http with https as appropriate. If we can verify that, we can allow http in the scheme.

You are correct. utils.normalize_repository does that for us.

twine/twine/utils.py

Lines 140 to 144 in d5c6a98

def normalize_repository_url(url: str) -> str:
parsed = urlparse(url)
if parsed.netloc in _HOSTNAMES:
return urlunparse(("https",) + parsed[1:])
return urlunparse(parsed)

Given this fact, I will go ahead and allow both http and https in the schemes, removed the patched test_integration.py and refactor validate_url to handle exceptions as per review comments

Copy link
Contributor

@bhrutledge bhrutledge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks again for working through the feedback; mine is mostly around code style.

twine/utils.py Outdated Show resolved Hide resolved
twine/utils.py Outdated Show resolved Hide resolved
twine/utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
mypy.ini Outdated Show resolved Hide resolved
@deveshks deveshks requested a review from bhrutledge May 10, 2020 22:24
Copy link
Contributor

@bhrutledge bhrutledge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. I thought of a few more. 😅

twine/utils.py Outdated Show resolved Hide resolved
twine/utils.py Outdated Show resolved Hide resolved
twine/utils.py Outdated Show resolved Hide resolved
@deveshks deveshks requested a review from bhrutledge May 11, 2020 10:26
Copy link
Contributor

@bhrutledge bhrutledge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm good with this, though I'm going to defer to @sigmavirus24 for final approval.

@sigmavirus24 sigmavirus24 merged commit e55869a into pypa:master May 11, 2020
@deveshks deveshks deleted the improve-scheme-check branch May 11, 2020 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential improvement for checking repository URL scheme/protocol
3 participants