Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata validation #147

Closed
bittner opened this issue Nov 1, 2018 · 30 comments · Fixed by pypi/warehouse#7582
Closed

Add metadata validation #147

bittner opened this issue Nov 1, 2018 · 30 comments · Fixed by pypi/warehouse#7582

Comments

@bittner
Copy link

bittner commented Nov 1, 2018

I'm trying to help fix the issue of broken package descriptions on PyPI (e.g. pypa/setuptools#1390).

Following a suggestion of @di in pypa/setuptools#1390 (comment), I would do the following:

  1. Add a validation module to packaging that provides a class with a validate() method and an error list attribute (an interface that can be easily used in Web frameworks like Pyramid and Django).
  2. Use this module in setuptools.dist.write_pkg_file to validate all incoming data against their specification. Setuptools, specifically, would then abort with an InvalidMetadataError if the specs are not met (instead of continuing silently generating a broken PKG-INFO file, as today).

Does that sound like something that makes sense?

Should I proceed and open a PR for a validation module?

@di
Copy link
Sponsor Member

di commented Nov 2, 2018

Hi @bittner thanks for raising this issue. As I mentioned in pypa/setuptools#1390 (comment), this is something I'm currently working on, and planning to have a PR up for discussion soon.

@bittner
Copy link
Author

bittner commented Nov 6, 2018

I'm not sure if you have started already as there is no PR yet, but if it helps you can take over parts of my code in PR #732 (e.g. the Metadata class) -- if that looks somewhat the direction you want to go.

I'm curious about what the interface and and usage pattern will look like, to know whether the implementation in write_pkg_filewill have to look a lot different than in PR #732.

@bittner
Copy link
Author

bittner commented Nov 13, 2018

Any progress with this issue? Setuptools' issue #1390 is waiting for this to be implemented first.

@bittner
Copy link
Author

bittner commented Dec 5, 2018

Any progress on this? I had hoped I could help fix this easily and quickly. Anything I can do to help get up to speed?

Reference to my original PR: pypa/setuptools#1562

@bittner
Copy link
Author

bittner commented Jan 22, 2019

@di Any progress with metadata validation?

@brainwane
Copy link

We discussed this at the packaging mini-summit in May and I've created pypa/packaging-problems#264 as a tracking issue covering the various TODOs necessary to plumb this through the parts of the toolchain.

@bhrutledge
Copy link

Here's @di's work thus far, in case other folks are interested in picking it up:

https://github.com/di/packaging/tree/metadata-validation
Diff: master...di:metadata-validation

@brainwane
Copy link

master...crwilcox:metadata-validation @crwilcox also took a stab at this, I believe?

At this point I am guessing that both @di and @crwilcox have stepped away from this task; it would be very helpful for the pip resolver work (see pypa/packaging-problems#264 ) over the next few months.

@bittner
Copy link
Author

bittner commented Jan 25, 2020

As a feedback to our community: It pays off to encourage people to modify their contribution instead of bluntly dismissing it as "broken" or stating "my idea is the better one", in a certain sense. It's all about people. We Pythonistas should know better. We should do better.

</offtopic>

@merwok
Copy link

merwok commented Jan 25, 2020

It’s not clear what you’re referring to: I don’t find any message saying broken or better in this thread.

@di
Copy link
Sponsor Member

di commented Mar 24, 2020

Wanted to post a bit of an update where I'm at with this since it's been a while, and answer some questions. TL;DR this issue is now unblocked and I'm going to continue working my branch.

Where should metadata validation live?
This project is definitely the right place for metadata validation. Metadata is defined in multiple PEPs, and this project is the implementation and standardization of these PEPs, for reuse across all packaging projects.

Where should the implementation come from?
While other projects might currently offer some degree of metadata validation, right now, the only one place where the "canonical" validation is happening is on PyPI. Ideally, we should take from the implementation there as much as possible since a) it is already completely defined b) we want to be able to swap in any replacement there without a ton of work.

What's was the current delay about?
Besides just life in general slowing me down, there's one issue in particular that has been blocking me finishing this. The branch that I was working on that was posted above is nearly complete, except for one metadata field: Classifiers.

What's the problem with the Classifiers field?
Right now, classifiers live in PyPI's database, and the only way to get the list of valid classifiers is by making an HTTP request to PyPI... every time, since the valid list could possibly be changed from one moment to the next. This would definitely be unacceptable.

Why don't we just do it later?
Slightly more acceptable would be shipping this without having support for classifiers, which would work, but is really just pushing the work onto our future selves. Additionally, this flaw would be pretty annoying for users, it would be hard to explain to them why validation works for everything except one field, and it probably wouldn't get fixed quickly.

What's the ideal solution?
Ideally, we would pull the storage of classifiers out of PyPI and into an external package. This would required a bit of a re-architecting of how PyPI handles classifiers, but I think it's worth it: in addition to unblocking this issue, it also puts the classifiers in a place where they can be reused, and makes it much easier for folks to request new trove classifiers -- it would just be a PR.

What's the update?
Today, I've done just that, releasing the trove-classifiers package and making the necessary changes to PyPI to use it instead in pypi/warehouse#7582. As a result, this issue is now unblocked and I'm going to continue working on my branch. 🎉(edit: I'll actually be working off @crwilcox's branch since he's added some additions here)

I'd appreciate thoughts & insights from folks that are interested in contributing. I'd especially appreciate a review on the upcoming PR to this repo, since I'm not familiar with the needs of all the tools that may want to use this feature, and want to be able to provide an API that can ideally be used by all of them.

@brainwane
Copy link

Thank you for this @di!

@di
Copy link
Sponsor Member

di commented Apr 3, 2020

Oops, this was not meant to be closed by pypi/warehouse#7582. I'm still working on this as described above. Externalizing the classifiers was just one step.

@westurner
Copy link

Are there other package servers that don't currently validate metadata? E.g. devpi, a directory served over HTTP

Is there a valid use case for custom local metadata; such as local trove classifiers?

Is there a PEP that says that metadata MUST validate?

Should there be an environment variable and/or a commandline switch for skipping / customizing metadata validation? E.g. _SKIP_METADATA_VALIDATION="classifiers,version"?

@pradyunsg
Copy link
Member

pradyunsg commented Apr 15, 2020

@di Would you have any ETA for the PR, in terms of an approximate quantum of time (like, a few days, around 5 weeks, a couple of months, or a few hours 🙂)?

For self-reference later: PyPI's current validation logic is contained in MetadataForm.

@di
Copy link
Sponsor Member

di commented Apr 20, 2020

@pradyunsg My only blocker at this point is finding enough hours in the day to finish this. I'll try to make another push on this this week and give an update at the end of the week.

@pradyunsg
Copy link
Member

Ah, awesome, thanks!

@brainwane
Copy link

@di is this still on your personal roadmap or should it be available for someone else to take over?

@di
Copy link
Sponsor Member

di commented Dec 3, 2020

Both, really. I think @brettcannon was talking about working on this soon. There is also a draft of the API in #332.

@brettcannon
Copy link
Member

Yeah, I've started thinking about it as PEP 621 and PEP 643 makes it more than just a validation problem but also managing various input and output formats as well as maintaining data consistency for the metadata overall (i.e. dynamic).

@bhrutledge
Copy link

This came up in pypa/twine#833 (comment). Are there any updates?

That also reminded me of pypa/twine#739, which drifted out of the scope of Twine, but remains open because of some of the suggestions starting at pypa/twine#739 (comment). I wonder if any of that is relevant for this issue.

@uranusjr
Copy link
Member

I think this mainly needs someone to actually do the bulk of the work. That’s how most recent additions happen for this repository.

@brettcannon
Copy link
Member

I didn't push it any farther due to perceived apathy about the idea and https://pypi.org/project/pep621/ coming into existence.

It's one of those things where it's a question of whether this is low-level enough to be in this project or in a separate one due to needing to keep dependencies low (i.e. attrs or pydantic could help with that but I don't see either getting pulled in as a dependency of this project due to vendoring concerns).

@brettcannon
Copy link
Member

And to be clear, the perceived apathy came from #383 where I was trying to come up with an API for metadata objects where the validation would occur.

If people start to show interest again then we can restart both conversations and start driving towards implementing all of this.

@abravalheri
Copy link
Contributor

Hi guys, I don't know if this might interest you in this context, but I have created a JSON Schema centric validation library focusing on PEP 621:

https://github.com/abravalheri/validate-pyproject

This work was inspired by pypa/setuptools#2671.

If this can be useful somehow, I am more than happy to engage.

@ssbarnea
Copy link

Using JSON Schema validation is a very good idea as it makes it portable across different file formats and editors. Do you happen to know a vscode extension that can make use of your schema? I would like to test it.

@abravalheri
Copy link
Contributor

abravalheri commented Nov 30, 2021

Sorry, I am afraid not 😓.

I suppose such extension would need to be very specific for the pyproject.toml use case... First it needs to parse TOML and then apply the different JSON Schemas provided for different parts of the data-structure. Not sure if this workflow can be easily integrated into existing extensions...

@di
Copy link
Sponsor Member

di commented Jun 19, 2022

For anyone following this issue, #518 was recently merged which adds a Metadata API, which this can now build upon.

@di
Copy link
Sponsor Member

di commented Jan 23, 2023

Aaaaand removed in #603. Larger meta issue here: #570.

@brettcannon
Copy link
Member

Closed by #686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.