Adds a type check to ensure the description type provided is valid #473

Bachmann1234 · 2019-07-07T18:32:27Z

Howdy!

Thanks for this check tool! While using it I made an error that I would have liked to see this tool catch so I thought I would take a stab at adding it myself. Let me know if you have any thoughts!

Thank you!

Bachmann1234 · 2019-07-07T18:34:11Z

twine/commands/check.py

@@ -88,6 +89,12 @@ def check(dists, output_stream=sys.stdout):
            )
            description_content_type = 'text/x-rst'

+        if description_content_type not in _supported_readme_types:
+            output_stream.write(
+                'warning; `long_description_content_type` invalid.\n'


Depending on the team's thoughts I could make a change to make this an outright failure.

The problem is if this type is wrong pypi will outright reject the package with a 400 error.

I know that technically this tool is checking if the readme will render so perhaps the type is a separate concern?

IN the back and forth on this PR I ended up just making this an error

Bachmann1234 · 2019-07-07T19:38:18Z

Looks like I have some lint errors to fix

twine/commands/check.py:95:80: E501 line too long (107 > 79 characters)
tests/test_check.py:128:80: E501 line too long (100 > 79 characters)

codecov · 2019-07-07T20:58:40Z

Codecov Report

Merging #473 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@           Coverage Diff            @@
##           master   #473      +/-   ##
========================================
+ Coverage   82.93%    83%   +0.06%     
========================================
  Files          14     14              
  Lines         750    753       +3     
  Branches      108    110       +2     
========================================
+ Hits          622    625       +3     
  Misses         92     92              
  Partials       36     36

Impacted Files	Coverage Δ
twine/commands/check.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add0aea...76efb49. Read the comment docs.

codecov · 2019-07-07T20:58:40Z

Codecov Report

Merging #473 into master will increase coverage by 1.57%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #473      +/-   ##
=========================================
+ Coverage   82.93%   84.5%   +1.57%     
=========================================
  Files          14      14              
  Lines         750     781      +31     
  Branches      108     116       +8     
=========================================
+ Hits          622     660      +38     
+ Misses         92      84       -8     
- Partials       36      37       +1

Impacted Files	Coverage Δ
twine/commands/check.py	`100% <100%> (ø)`	⬆️
twine/commands/upload.py	`92.72% <0%> (+1.06%)`	⬆️
twine/package.py	`88.49% <0%> (+1.76%)`	⬆️
twine/repository.py	`75.43% <0%> (+2.1%)`	⬆️
twine/utils.py	`87.23% <0%> (+3.54%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add0aea...3440847. Read the comment docs.

Bachmann1234 · 2019-07-07T21:02:01Z

Ok, all fixed now :-)

bhrutledge

This looks like a useful addition to check. I'm not a maintainer, just a new contributor, so I can't speak to the UX questions. However, I do have some style suggestions.

bhrutledge · 2019-07-07T20:49:27Z

tests/test_check.py

+    })
+
+    monkeypatch.setattr(check, "_find_dists", lambda a: ["dist/dist.tar.gz"])
+    monkeypatch.setattr(


I can see how this patching makes sense with the structure of check.check(), but it feels a bit awkward. What do you think about extracting a function from check()? Something like:

def check(dists, output_stream=sys.stdout): # ... for filename in uploads: output_stream.write("Checking distribution %s: " % filename) package = PackageFile.from_filename(filename, comment=None) failure |= check_package(package, output_stream)

That way, you could unit-test check_package with a PackageFile stub directly, and maybe not have to monkeypatch anything.

Let me take a look. I was mostly just following the existing structure trying to keep the changes pretty minimal.

But I guess I can give it a wider look and just revert back if the maintainer prefers the more conservative approach

tests/test_check.py

twine/commands/check.py

bhrutledge · 2019-07-07T20:58:05Z

tests/test_check.py

+
+    output_stream = check.StringIO()
+    check.check("dist/*", output_stream=output_stream)
+    assert output_stream.getvalue() == (


Could this assert be a little narrower? I.E., just check for the expected error content, rather than the whole message? I received similar feedback on a recent PR.

I mean maybe? I could basically check for the existence of warning; long_description_content_type invalid to ensure I hit the line. But it would be behaving pretty differently then the other tests of the file

I've gotten the impression that there's leeway to deviate from the existing test structures, with this sort of thing being an example. I think your proposed check would be sufficient.

Co-Authored-By: Brian Rutledge <bhrutledge@gmail.com>

Bachmann1234 · 2019-07-07T22:06:43Z

Restructured the check logic a bunch. I think it's cleaner now and potentially prepared to take on other functionality if there are thoughts. Though I do worry I have greatly expanded the initial scope of the PR.

That being said im game to keep working with it. If people like the bigger change and have further comments ill keep going. Or if they like the idea but would prefer a smaller change I could revert back. Either way. (the original PR was just this initial commit 76efb49)

Thanks @bhrutledge for the feedback so far

Bachmann1234 · 2019-07-07T22:12:58Z

twine/commands/check.py

+        output_stream
+    )
+    description = _get_description(metadata, output_stream)
+    failure = _description_will_not_render(


Right now a render failure is the only thing that triggers a a failure. I personally think the description type being provided but invalid should be a failure as well but I wanted to wait for feedback before I did that. See #473 (comment)

I tend to agree, but maybe there are some other considerations that we don't know about.

With that in mind, I think it will be easier to adjust the failure modes if the utility functions above were inlined here, as they were in check(). It would also be less of a structural change.

Related to that, the _get_* functions feel like they're doing two things: retrieving a value, and checking its validity. If they don't get inlined, I think it'd be more flexible if they were just validators (possibly with the warning output), similar to _description_will_not_render.

What do you think?

Im not sure. They warn you if something is potentially strange but its basically just a log line. The check_package function does not really depend on any of that that other than just getting the value. That's why I see them as getters.

If the validation was actually relevant to the check_package function I could see either separating it out or inlining that validation to check_package

actually saying that makes me wanna go ahead and make the type check a proper validation. Since unlike the other two warnings. It actually does matter.

I went ahead and inlined the getters. What pushed me over the edge was realizing the order which output stream gets written to does matter. Given this I did not want to be passing around output_stream and having functions return both their values and any warnings they wanted to emit felt like a bridge too far.

twine/commands/check.py

bhrutledge · 2019-07-08T10:05:01Z

tests/test_check.py

+
+    output_stream = check.StringIO()
+    check.check("dist/*", output_stream=output_stream)
+    assert output_stream.getvalue() == (


I've gotten the impression that there's leeway to deviate from the existing test structures, with this sort of thing being an example. I think your proposed check would be sufficient.

bhrutledge · 2019-07-08T10:05:10Z

tests/test_check.py

+    failed = check_package(package, output_stream)
+    assert not failed  # Invalid Description type is currently just a warning
+    assert output_stream.getvalue() == (
+        'warning; `long_description_content_type` invalid.\n'


Suggested change

'warning; `long_description_content_type` invalid.\n'

'warning: `long_description_content_type` invalid.\n'

bhrutledge · 2019-07-08T10:15:43Z

twine/commands/check.py

+        output_stream
+    )
+    description = _get_description(metadata, output_stream)
+    failure = _description_will_not_render(


I tend to agree, but maybe there are some other considerations that we don't know about.

With that in mind, I think it will be easier to adjust the failure modes if the utility functions above were inlined here, as they were in check(). It would also be less of a structural change.

Related to that, the _get_* functions feel like they're doing two things: retrieving a value, and checking its validity. If they don't get inlined, I think it'd be more flexible if they were just validators (possibly with the warning output), similar to _description_will_not_render.

What do you think?

Co-Authored-By: Brian Rutledge <bhrutledge@gmail.com>

sigmavirus24 · 2019-07-08T13:54:00Z

So years ago, PyPI didn't accept markdown and only in the last 2-3 years did they start. There's nothing to prevent PyPI potentially adding support for other doc formats (asciidoc, orgmode, etc.) except for someone implementing it and there being a sufficient use-case to support it. Provided that, I don't think a strict error here (without some sort of escape hatch) is necessary. I fully trust users to know what they're doing and if a user is stuck to some older version of twine because that's what their operating system provides and they can't install a newer version (or don't want to), should we really prevent them from uploading their package to that future PyPI?

Bachmann1234 · 2019-07-08T13:59:50Z

That's a good point. Would you be interested in downgrading this to a warning? Or should I close this?

sigmavirus24 · 2019-07-08T15:53:42Z

Oh wait, am I remembering correctly that this is automagically ran on upload? If not, ignore my concerns about things breaking for folks in the future.

Bachmann1234 · 2019-07-08T21:25:40Z

I dont think upload runs check. I only found this command after googling around trying to find out why my package was not uploading.

Bachmann1234 · 2019-07-08T23:58:47Z

Yeah, here is a test I ran (using my branch. I can re run on master if you would like)

I introduced a syntax error into a readme on a project of mine. Verified the error with twine check

Then I tried to upload a version (the version already existed so I expected it to be rejected but it was rejected due to the description instead of the expected reason)

If check was running as part of upload I would not have expected a request to have been made at all

❯ ~/code/twine/venv/bin/twine check dist/marshmallow-polyfield-5.7.tar.gz
Checking distribution dist/marshmallow-polyfield-5.7.tar.gz: Failed
The project's long_description has invalid markup which will not be rendered on PyPI. The following syntax errors were detected:
line 1: Severe: Title overline & underline mismatch.

=====================
Marshmallow-Polyfield
======================

❯ ~/code/twine/venv/bin/twine upload dist/marshmallow-polyfield-5.7.tar.gz --verbose
Uploading distributions to https://upload.pypi.org/legacy/
Uploading marshmallow-polyfield-5.7.tar.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19.1k/19.1k [00:00<00:00, 25.4kB/s]
Content received from server:
<html>
 <head>
  <title>400 The description failed to render for 'text/x-rst'. See https://pypi.org/help/#description-content-type for more information.</title>
 </head>
 <body>
  <h1>400 The description failed to render for 'text/x-rst'. See https://pypi.org/help/#description-content-type for more information.</h1>
  The server could not comply with the request since it is either malformed or otherwise incorrect.<br/><br/>
The description failed to render for &#x27;text/x-rst&#x27;. See https://pypi.org/help/#description-content-type for more information.


 </body>
</html>
HTTPError: 400 Client Error: The description failed to render for 'text/x-rst'. See https://pypi.org/help/#description-content-type for more information. for url: https://upload.pypi.org/legacy/

di · 2019-07-09T09:53:44Z

FWIW, I think the "right" way to go about this would be to do what I suggested in #430: add the ability to validate metadata in pypa/packaging, and then validate all the metadata for a given distribution with twine check (not just Long-Description-Content-Type).

I don't think twine should be embedding any logic about what is and isn't valid metadata.

Bachmann1234 · 2019-07-09T12:43:04Z

I think that makes a lot of sense. I am not super informed in the warehouse/packaging/pip development structures but from what I can tell adding full metadata validation seems to be a larger problem that may take some time. Though it looks like its being worked on based on pypa/packaging#147 and pypa/packaging-problems#264 . Reading though all that its probably something best left to the core team given their experience.

Given that, I could close this, I think the only potential reason to keep it is that right now check depends on this field being correctly specified to do its checking properly.

If you put in an invalid type the tool will just go use rst which seems like a strange choice to me. I get defaulting to rst if type is missing since that used to be the only rendering method. However, if I put in something unsupported having the check give back rst rendering errors seems counter intuitive.

bhrutledge · 2019-07-09T12:56:12Z

twine/commands/check.py

+    if description in {None, 'UNKNOWN\n\n\n'}:
+        output_stream.write('warning: `long_description` missing.\n')
+        description = None
+    failures = (


I realize the future of this PR is in question, but I'm curious about the decision to inline some validation, and put others in helper functions. FWIW, I'd find this easier to follow if they were all in check_package, and given the uncertainty around what should be a warning or a failure, I think it'd be easier to change in the future.

My thinking here is that warning being emitted is not relevant to the final result of check_package. It does not feel like a validation so much as a log line since a package cannot fail due to this

Bachmann1234 commented Jul 7, 2019

View reviewed changes

Adds a type check to ensure the description type provided is valid

76efb49

Bachmann1234 force-pushed the add_format_type_check branch from 64c401c to 76efb49 Compare July 7, 2019 20:58

bhrutledge reviewed Jul 7, 2019

View reviewed changes

Bachmann1234 and others added 7 commits July 7, 2019 17:04

Update tests/test_check.py

b370080

Co-Authored-By: Brian Rutledge <bhrutledge@gmail.com>

Break up the check functionality into its individual parts

c8bcab3

un-monkeypatch my test

7ff3772

Clarify this comment

99e4ebb

Un-monkeypatch the other description like check

05b0260

Update twine/commands/check.py

ee6ef84

Co-Authored-By: Brian Rutledge <bhrutledge@gmail.com>

Update usages of renamed variable

356b4ec

Fix some long lines

bfd9f7a

Bachmann1234 commented Jul 7, 2019

View reviewed changes

bhrutledge reviewed Jul 8, 2019

View reviewed changes

Bachmann1234 and others added 5 commits July 8, 2019 08:05

Update twine/commands/check.py

bdc8e00

Co-Authored-By: Brian Rutledge <bhrutledge@gmail.com>

get other colon

dc7fa21

Update assertion to just be checking keywords

cbb0106

Change to a proper error

44216f7

Inline the getters

3440847

bhrutledge reviewed Jul 9, 2019

View reviewed changes

Bachmann1234 closed this Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a type check to ensure the description type provided is valid #473

Adds a type check to ensure the description type provided is valid #473

Bachmann1234 commented Jul 7, 2019

Bachmann1234 Jul 7, 2019

Bachmann1234 Jul 8, 2019

Bachmann1234 commented Jul 7, 2019

codecov bot commented Jul 7, 2019

codecov bot commented Jul 7, 2019 •

edited

Bachmann1234 commented Jul 7, 2019

bhrutledge left a comment

bhrutledge Jul 7, 2019

Bachmann1234 Jul 7, 2019

bhrutledge Jul 7, 2019

Bachmann1234 Jul 7, 2019

bhrutledge Jul 8, 2019

Bachmann1234 commented Jul 7, 2019 •

edited

Bachmann1234 Jul 7, 2019

bhrutledge Jul 8, 2019

Bachmann1234 Jul 8, 2019

Bachmann1234 Jul 8, 2019

bhrutledge Jul 8, 2019

bhrutledge Jul 8, 2019

bhrutledge Jul 8, 2019

sigmavirus24 commented Jul 8, 2019

Bachmann1234 commented Jul 8, 2019

sigmavirus24 commented Jul 8, 2019

Bachmann1234 commented Jul 8, 2019

Bachmann1234 commented Jul 8, 2019 •

edited

di commented Jul 9, 2019

Bachmann1234 commented Jul 9, 2019

bhrutledge Jul 9, 2019

Bachmann1234 Jul 9, 2019

	'warning; `long_description_content_type` invalid.\n'
	'warning: `long_description_content_type` invalid.\n'

Adds a type check to ensure the description type provided is valid #473

Adds a type check to ensure the description type provided is valid #473

Conversation

Bachmann1234 commented Jul 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bachmann1234 commented Jul 7, 2019

codecov bot commented Jul 7, 2019

Codecov Report

codecov bot commented Jul 7, 2019 • edited

Codecov Report

Bachmann1234 commented Jul 7, 2019

bhrutledge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bachmann1234 commented Jul 7, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sigmavirus24 commented Jul 8, 2019

Bachmann1234 commented Jul 8, 2019

sigmavirus24 commented Jul 8, 2019

Bachmann1234 commented Jul 8, 2019

Bachmann1234 commented Jul 8, 2019 • edited

di commented Jul 9, 2019

Bachmann1234 commented Jul 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 7, 2019 •

edited

Bachmann1234 commented Jul 7, 2019 •

edited

Bachmann1234 commented Jul 8, 2019 •

edited