Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom checkers to twine check #848

Open
bhrutledge opened this issue Dec 30, 2021 · 28 comments
Open

Add custom checkers to twine check #848

bhrutledge opened this issue Dec 30, 2021 · 28 comments
Labels
feature request question Discussion/decision needed from maintainers

Comments

@bhrutledge
Copy link
Contributor

Caveat: I have very little experience with proposing and implementing substantial new features/APIs in open-source projects.

There have been more than a few requests to add functionality to twine check, but those have been mostly flagged as duplicates of #430, which is currently blocked on pypa/packaging#147, which hasn't seen much movement.

#843 contains a proposal to add custom "checkers" via setuptools entry_points, which would allow functionality to be added by independently-maintained packages. Twine already allows for plugin sub-commands via entry_points, but those would have to implement argument parsing and reading packages from disk, and be run as a separate command from twine check.

The purpose of this issue is to gather requirements and propose an API for custom checkers. For an initial (but insufficient) overview of the current API, see the autodoc for twine.commands.check.

Some initial ideas/questions:

  • Currently, the only check is for long_description, implemented in a private method. That should eventually be refactored to match the new API, but it could be refactored earlier to help establish the new API.
  • What are some examples of custom checks that people would like to write? Can we/they share example implementations to drive the API design?
  • What attributes of a package do people want to check? Presumably metadata, which is available via PackageFile, along with the original filename.
  • What lessons can we learn from other tools that offer plugin functionality, e.g. flake8?
  • How should/must the new API be documented before it's implemented?
  • I'm assuming we want/need to maintain the API of check() and main() in twine.commands.check.
@bhrutledge bhrutledge changed the title Add plugin functionality to twine check Add custom checkers to twine check Dec 30, 2021
@bhrutledge
Copy link
Contributor Author

@sigmavirus24 What do you think of this initial writeup? Any edits before I publicize it (e.g. to folks who submitted check feature requests)?

@bhrutledge
Copy link
Contributor Author

@sigmavirus24 Just noticed the 👍 (can't decide if the absence of notifications for those are good or bad). Is there anything you'd like to add, e.g. lessons learned from flake8?

@webknjaz
Copy link
Member

Looks reasonable. Although, I don't know if the checkers would need custom args (if they did, we'd also be talking about a config file, I suppose).

@bhrutledge
Copy link
Contributor Author

@webknjaz Can you give an example of a checker that would need custom args? And/or can you propose how some of twine check features would be implemented in a checker?

@webknjaz
Copy link
Member

Not really, I was just commenting on the original mentions of having to implement arg parsing which doesn't seem necessary right now.

@bhrutledge
Copy link
Contributor Author

cc @di @ssbarnea @gaborbernat @henryiii

Y'all have shown some interest in improving twine check. Do you have any thoughts on this proposal?

@ssbarnea
Copy link

I am not very keen on having plugins for these checks as it would mean that we would need to modify each project running 'twine check --strict' in order to enable to newly added checks. I do not want to have to add them to benefit from newer checks. If we can avoid that downside the better. If we add new check only to strict, we should be fine.

Regarding where these checks are implemented, i do not care much, but if they are in another project, it should be a direct dependency of twine, not an optional one.

@di
Copy link
Sponsor Member

di commented Jan 12, 2022

I think the proposal is functionally sound, but I'm not sure I see the demand for third-party plugins for twine check. I think the main goal of the command is determining "will PyPI accept/reject this distribution & why" and to that end, we probably want to be authoritative about the results and not leave that for others to implement.

Assuming pypa/packaging#147 eventually happens (I see no objections there, just and lack of people to work on it) I'm having a hard time seeing what else users would want twine check to do, and thus a justification for this feature.

@henryiii
Copy link

I think I'll likely be a little repetitive of @di and @ssbarnea if I write a full response. I don't think there are an unlimited number of checks for metadata that need to be done. There are a set of things that are needed (either for PyPI, or for consistency), but there isn't a need for an arbitrary number of things, like flake8. And, honestly, I think pre-commit has been a better solution than flake8 - instead of providing an extension system for running checks, it's an infrastructure for running anything, so there are lots of great (and modifying) checks that are not in flake8. I would see the same thing here - if you really want to add a custom check, you can write a custom checker and then run it on your wheel files, adding it to pre-commit or CI or wherever. Extending something based on what is installed in the current environment has caused issues for pytest (flaky is registered by multiple extensions, which causes issues for pypa/build).

I'm very much in favor of twine check becoming more powerful, but I don't think it needs to be come externally extensible. A good design for internal use is nice, but it doesn't seem to be that helpful to be able to add twine-<stuff> extensions to add checks. If there's really a need for a package-metadata version of flake8, it doesn't have to be part of twine, IMO.

For completeness, here's a few things I can think of that would be nice to check:

  • Extra quotes. Sometimes package names and such get an extra set of quotes (at least if placed in setup.cfg, this can be common) - could be checked when making packages too
  • Python-Requires missing (would need to be twine check only)
  • Missaligned Python-Requires and wheel tags (py2 + 3+ for Python requires) (twine check only)

For "extendable" checks, the only think I would think of would be validating md or rst markup, but that can be done in the original file with a normal checker, usually.

@CyrilRoelandteNovance
Copy link

So I proposed #843 and tried to introduce the plugin system because I felt like it could be useful. To be honest, I'm fine with dropping the "plugin" architecture and having all checkers live in twine.

What are some examples of custom checks that people would like to write? Can we/they share example implementations to drive the API design?

  • Make sure a license is specified
  • Make sure the "license" field is not used when a classifier is enough
  • Make sure "python-requires" is specified
  • Check for the presence of "Programming Language :: Python ::" classifiers
  • Make sure a homepage/author is specified, as it's often useful for users/packagers to be able to contact authors to report bugs/submit patches

How should/must the new API be documented before it's implemented?

I think it should be kept simple:

def checker(filename: str) -> Tuple[List[str], List[str]]:

The checker would take the filename of a tarball or wheel and return a list of warnings/errors.

@ssbarnea
Copy link

Lets put all checks in twine and look into moving them only if that becomes a problem, avoiding over-engineering. At some point we may want to add a noqa mode for bypassing some of them deliberately, but again, later. The more checks we add to twine strict the better.

@sigmavirus24
Copy link
Member

I disagree these should all just get thrown into twine. Specifically, everything @CyrilRoelandteNovance listed is non-essential for uploading to PyPI. Those are things other organizations what, which is exactly the reason for making twine check extensible. Many companies publish things and want an easy way to lint things. Other languages provide this, Python makes it manual unless that company writes their own specific tooling.

It makes no sense to build checkers for every optional thing some mega-corp might want. I also don't see how we'd build a # noqa given that twine looks at the generated metadata so this concept of "We can figure out a way to ignore things when the tool turns to crap" is moot.

Finally, @CyrilRoelandteNovance's idea of making every plugin figure out how to unpack a distribution, etc seems incredibly naive.

If there's really a need for a package-metadata version of flake8, it doesn't have to be part of twine, IMO.

This makes total sense to me

Missaligned Python-Requires and wheel tags (py2 + 3+ for Python requires) (twine check only)

There are other tools that do this already namely https://github.com/asottile/setup-cfg-fmt

@bhrutledge I think we can close this

@henryiii
Copy link

I think we are talking about twine check, not twine upload. I think it's okay if check does a little more that just the upload, as long as it's clearly things that all packages should do. But I'd understand if not.

It would be nice of wheel had a api, it would make implementing this elsewhere easier.

Setup-cfg-fmt only works for one (of soon to be three) ways to configure setuptools, not for any PEP 517 build system, and it absolutely does not check this.

@sigmavirus24
Copy link
Member

it absolutely does not check this.

It auto-fixes requires_python and the classifiers. So it's not going to warn you, but why do you want to have to manually fix things when the computer will do that for you?

@bhrutledge
Copy link
Contributor Author

bhrutledge commented Jan 16, 2022

I think I missed this distinction:

non-essential for uploading to PyPI.

It seems like there's some disagreement about whether twine check should only check essentials, vs. check "best practices" that are subject to opinion. I tend to agree with @sigmavirus24 that "best practice" checks should not be implemented in this repo. However, it seems like twine check could be a convenient hook for opinionated folks to integrate such checks into their CI, and that implementing the architecture to support that will make it easier to add essential checks to Twine.

That said, it's not clear to me what is essential for uploading a package to PyPI, and does (or should) that include being able to pip install the package? Could someone (maybe @di) share a list? Similarly, of all the outstanding requests to improve twine check, what of those would be considered essential?

@di
Copy link
Sponsor Member

di commented Jan 16, 2022

IMO, twine check should check whether an upload to PyPI will succeed or fail -- that's about it.

That said, it's not clear to me what is essential for uploading a package to PyPI, and does (or should) that include being able to pip install the package? Could someone (maybe @di) share a list?

Essentially it needs to pass a lot of metadata validation and validation of things like the filename/extension, etc, which lives at https://github.com/pypa/warehouse/blob/bf389e4acd129697b28900c962dc9d363da3b00f/warehouse/forklift/legacy.py

Similarly, of all the outstanding requests to improve twine check, what of those would be considered essential?

I think that's probably pypa/packaging#147 and #430 (which might itself be fixed by pypa/packaging#147). Everything else seems to be a 'best practice' that PyPI doesn't actually enforce, something that hasn't been fully agreed upon as a standard, or some kind of meta-improvement.

@ssbarnea
Copy link

@di Your statement is in contradiction with --strict mode. Basically if twine would only check if an upload would succeed it would not need a strict mode. The idea of strict mode was that it would find problems that pypi is not able to catch or prevent.

@bhrutledge
Copy link
Contributor Author

bhrutledge commented Jan 17, 2022

So that we're on the same page, here's the current behavior of twine check.

Without --strict:

  • Warn if long_description_content_type is missing
  • Warn if long_description is missing
  • Error if long_description has syntax errors in markup and would not be rendered on PyPI

With --strict: warnings are treated as errors.

It sounds like most people on this discussion (all of whom aren't Twine maintainers 😉) are suggesting that "best practice" checks be added as warnings, similar to a missing long_description. I'm open to that in theory, although I'm guessing that not all projects that use --strict in CI would agree on what the set of warnings should be, which would result in folks opening Twine issues asking to revert a "best practice" check, or make it optional. Such checks could be made optional through configuration, but I'm a little reluctant to add that complexity.

@ssbarnea
Copy link

I agree with @bhrutledge that adding extra configuration complexity should be avoided, if possible. I wonder if we could make use of python warnings module for these warnings and collected and display them, effectively action on them.

I mention this because if we use these warnings, it means that power-users could also make use of PYTHONWARNINGS to override default behavior, basically giving them a bypass configuration mechanism without us having to implement support for it.

@bhrutledge
Copy link
Contributor Author

The current behavior is to write warnings to sys.stdout. There's related discussion in #818 about using logger.warning, instead of warnings.warn, to normalize Twine's output re: verbosity.

@sigmavirus24
Copy link
Member

@di Your statement is in contradiction with --strict mode. Basically if twine would only check if an upload would succeed it would not need a strict mode. The idea of strict mode was that it would find problems that pypi is not able to catch or prevent.

That may have been the initial idea, but the initial ideas of many things sometimes don't have sufficient experience to be correct. Look at the discussion of universal wheels. They were always intended originally to be python version independent and now have transmogrified to being python 2 and Python 3 compatible because of a choice made by a popular implementation.

@sigmavirus24
Copy link
Member

So that we're on the same page, here's the current behavior of twine check.

Without --strict:

  • Warn if long_description_content_type is missing
  • Warn if long_description is missing
  • Error if long_description has syntax errors in markup and would not be rendered on PyPI

With --strict: warnings are treated as errors.

It sounds like most people on this discussion (all of whom aren't Twine maintainers 😉) are suggesting that "best practice" checks be added as warnings, similar to a missing long_description. I'm open to that in theory, although I'm guessing that not all projects that use --strict in CI would agree on what the set of warnings should be, which would result in folks opening Twine issues asking to revert a "best practice" check, or make it optional. Such checks could be made optional through configuration, but I'm a little reluctant to add that complexity.

I am forcefully against "best practice" checks especially when there's no source of truth just disparate groups of the opinion that the things they want are best practices. The only sane option in this case is a huge amount of contributed checks we few would have to maintain and eventually probably make configurable or pluggable architecture to let people install their best practices plugins as they see fit. The latter is seen as wholly unappealing at this point and the former us appealing to those who won't be doing the work and repulsive to me.

I think it's best to close this as "twine will ensure the minimum necessary medatata has been generated and included in an artifact for use with PyPI and only PyPI". Existing feature requests that don't fit into that should be closed.

@bhrutledge
Copy link
Contributor Author

bhrutledge commented Jan 17, 2022

Breaking down @sigmavirus24's observations.

The only sane options:

  1. A huge amount of contributed checks we few would have to maintain and eventually probably make configurable. Appealing to those who won't be doing the work and repulsive to me.
  2. A pluggable architecture to let people install their best practices plugins as they see fit. Seen as wholly unappealing at this point.

@sigmavirus24 Re: option 2, do you see that as unappealing? I still find it appealing, for its potential utility, a cleaner factoring of twine.commands.check, and because it sounds like a fun project.

I don't find option 1 repulsive, but I am a -1.

I think it's best to close this as "twine will ensure the minimum necessary metadata has been generated and included in an artifact for use with PyPI and only PyPI". Existing feature requests that don't fit into that should be closed.

Before doing that, I'd like to dig into @di's thoughts in #848 (comment), and get a sense of what it would take to fulfill the promise of ensuring the minimum necessary metadata.

@sigmavirus24
Copy link
Member

Before doing that, I'd like to dig into @di's thoughts on #848 (comment), and get a sense of what it would take to fulfill the promise of ensuring the minimum necessary metadata.

I think that should be a separate discussion. This one has too many competing threads of discussion.

@henryiii
Copy link

Sorry I took a while, intended to post this earlier, and I think it's been indirectly mentioned:

As I see it, there are three levels of checks:

  1. Checks for things that will not allow uploads to PyPI. These should be errors when running twine check. (Error in markup on PyPI that would be accepted is also bundled here currently)
  2. Checks for things that are plainly incorrect / wrong. These are currently warnings, and can be made errors with --strict.
  3. Things that are possibly stylistic, or might not be completely universal for some reason.

Category 1 was mentioned by @di - and that's currently still lacking, and should be the highest priority. That's clearly a job for twine check.

Category 2 already does exist, but only a few checks on long-description are implemented, and of those, missing fields is actually one I'd probably categorize in category 3 if I were starting from scratch. This has limited enough scope I think it could be "warnings" in twine check.

I think we all agree category 1 should be part of twine, and category 3 can be implemented elsewhere. I think the best solution for allowing others to write checks is not to have a plugin system for twine, but to have some sort of API that others can use - maybe starting with wheel eventually, or even if twine provided an import twine that could be used to make checks. Someone else could setup plugins, or make their own checks easily. Plugin systems are fragile - look at pytest's flaky for example, or the recent IPython 8 usage of "black" as a plugin. I'm not saying they are bad, but would take work and make installing twine as an "app" harder (I always use pipx run twine, the GitHub action builds it in, etc).

I think clear invalid data should be warning - this is very much category 2. For example, if requires-python=">=3.7" is specified and py2 is in the tags, this is incorrect and confusing; a warning should be produced. Also, I hadn't thought about broken classifiers, but that could be a warning too - if a python :: 3.6 classifier was given this would also be a miss-match. I do not think missing classifiers would at all count in category 2 - that's category 3, and a stylistic choice - classifiers are optional. But any clear inconsistency == bad metadata should be a warning, IMO.

If we call missing metadata a "warning", I think we could have a minimum set of minimum metadata (long-description, requires-python, etc) that "should" be included, and those could be warnings (errors on --strict). This is based on the existing warnings. As I said, I'd really consider this category 3 if it wasn't for the fact it's already there for long-description.

I'd like to see some sort of checker that does category 3 - I'd like something that makes sure I don't forget various classifiers, some types of URLs, etc. But I think just providing a bit of API to help others provide this is likely enough. A tool for this would need to support configuration to tell it what to allow / skip, etc. Similar to check-manifest, for example.

@henryiii
Copy link

Look at the discussion of universal wheels. They were always intended originally to be python version independent and now have transmogrified to being python 2 and Python 3 compatible because of a choice made by a popular implementation.

I need to briefly reply to this. Python version independent wheels predate universal. The original method to support Python 2 and 3 was to pre-process the Python 3 code with 2to3 - so this was the default assumption by bdist_wheel. universal was developed as a command directly to bdist_wheel to tell it to produce Python 2 and Python 3 tags for a wheel in wheel 0.10 (2012). At the time, there was no way to tell a Python version independent package what range of Pythons it supported; the assumption was it could not span 2-3, and always matched the major version. universal has no standardized form in any PEP and has no meaning outside of setuptools + wheel. There literally is only one implementation because it's not a standard - it is the implementation. Here's the original implementation note: "universal=1 will apply the py2.py3-none-any tag for pure python wheels." PEP 427 does not even mention "universal".

What was developed as a standard was Requires-Python, this doesn't require custom commands to bdist_wheel. Wheel has never been updated to take this field into account, and since there's no need to do this except to support Python 2, I don't see it happening - pure Python and extensions are handled correctly in most cases, and the cases they are not are also not fixed by a single setting (for example, if a pure Python library collects unicode escapes per Python version as data files).

The proposal in a different issue was to simply check the Python tags, and produce a warning if a Python 3 only package (via Requires-Python) has a py2 tag; this is clearly wrong and confusing. It doesn't matter how it was produced or how it is fixed, it's simply inconsistent metadata which is a good thing to check, IMO.

@di
Copy link
Sponsor Member

di commented Jan 18, 2022

@di Your statement is in contradiction with --strict mode. Basically if twine would only check if an upload would succeed it would not need a strict mode. The idea of strict mode was that it would find problems that pypi is not able to catch or prevent.

Well, that's probably because I can't say I really agree with the addition of the --strict flag in #715. The initial idea in adding these warnings in #412 was to warn the user that running twine check without these fields was meaningless, not to imply that adding them was a "best practice" (even if some folks might consider it to be).

Before doing that, I'd like to dig into @di's thoughts in #848 (comment), and get a sense of what it would take to fulfill the promise of ensuring the minimum necessary metadata.

This is covered in detail in pypa/packaging#147. The TL;DR is: move PyPI's metadata validation out of PyPI and into packaging, where it can be reused by both PyPI and twine.

@sigmavirus24
Copy link
Member

Well, that's probably because I can't say I really agree with the addition of the --strict

Yeah, I think a better name would have been akin to clang or sphinx where you have --error-on-warning instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request question Discussion/decision needed from maintainers
Projects
Better `check`
  
To do
Development

No branches or pull requests

7 participants