Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Add Core Metadata API (as a dataclass) #498

Closed
wants to merge 26 commits into from

Conversation

abravalheri
Copy link
Contributor

This work is a proof-of-concept for the implementation of the Core Metadata API based on the discussion in #383. Please feel free to ignore/close or select just the interesting parts of this PR, my main intention is just to have something concrete that can be used to advance the discussion.

I started with the API proposed in #383 (comment), but changed it to be use a frozen dataclass (which simplifies the design: we don't have to worry about questions such as: "should dynamic be automatically updated when the target field is edited?"). Please note that users can still use dataclasses.replace to modify the data structure freely.

Another divergence from the proposed API is the usage of Collection instead of Iterable (which also provides the handy __contains__ and __len__ methods...).

@abravalheri
Copy link
Contributor Author

The tests in 3.6 fail as expected (due to the lack of support for dataclasses).

@abravalheri abravalheri marked this pull request as ready for review January 16, 2022 19:07
@abravalheri abravalheri mentioned this pull request Jan 16, 2022
packaging/metadata.py Outdated Show resolved Hide resolved
packaging/metadata.py Outdated Show resolved Hide resolved
packaging/metadata.py Outdated Show resolved Hide resolved
packaging/metadata.py Outdated Show resolved Hide resolved
packaging/metadata.py Outdated Show resolved Hide resolved
packaging/requirements.py Outdated Show resolved Hide resolved
tests/test_metadata.py Outdated Show resolved Hide resolved
tests/test_metadata.py Outdated Show resolved Hide resolved
abravalheri added a commit to abravalheri/packaging that referenced this pull request Jan 18, 2022
As mentioned in the review for pypa#498, conditions in tests should not be
based in what the code being tested, otherwise they might end up hiding
other problems.

The solution is to pass flags in the test parameters themselves.
@abravalheri
Copy link
Contributor Author

Thank you very much for the kind review and improvement suggestions @brettcannon. I tried to fix a lot of the points you mentioned, but there are still some parts that seem fuzzy to me, so I left some questions in my comments, I hope you don't mind.

The PR definitely has rough edges so any feedback is more than welcome.

# 2.2
dynamic: Collection[NormalizedDynamicFields] = ()

def __post_init__(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this method is necessary. If you mess up on what you assign then that's on you; language of consenting adults and all. Otherwise you should be running a type checker to make sure you are keeping your types consistent with what people expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the comment @brettcannon. Here I tried to follow the principle of being loose with the input but strict with the output. But I completely understand your point.

Something that I am also thinking now is: since packaging aims to be a collection of "reusable core utilities for various Python Packaging interoperability specifications", it would be nice if it could handle some of the tedious and repetitive processing that would be required to use this class (e.g. converting strings to Requirement or SpecifierSet objects) and are currently implemented in __post_init__.

So what if I remove this method as you suggested, but add another method that would handle this use case? For example a from_unstructured_dict class method (or any other name really, please feel free to bikeshed) with proper and explicit type annotations?

(Please note that my intention here is not to force the types to be correct, but instead absorb the repetitive tasks that would have to be implemented by any backend using such metadata API)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While "accept loose, output strict" is a good idea in most cases, I don't really see what situations we'd have a user of this API throw an unstructured dictionary at this API?

The only case that I think we'd want to accept a dictionary in, would be if we added PEP 621 ingestion (which, I'll keep the discussion for in #383 since that's a "scope" question, rather than an implementation question).

Every other user should be able to just pass the right arguments to construct this object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the comment @pradyunsg! I will bring this discussion to #383 then 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pradyunsg : the structure of the data that's going to be coming in is very much going to be from known formats. The only one that might be different is someone trying to convert setup.cfg/setup.py, but that doesn't need to live here (i.e. we don't have to be all things to all people).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such a method would be useful to implement from_pyproject so I thought that making it public would allow people making more use of the class even before from_pyproject gets implemented.

But I have no problems with the proposed approach. I will trust in your intuition here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abravalheri it's more paranoia. 😅 It's much harder to take something away then to introduce something to an API. As such, we prefer to keep the APIs small and for scenarios we know exist rather than guess at future needs. "Premature optimization" applies to API design as well. 😉

Comment on lines 354 to 357
first_line = lines[0].lstrip()
text = textwrap.dedent("\n".join(lines[1:]))
other_lines = (line.lstrip("|") for line in text.splitlines())
return "\n".join(chain([first_line], other_lines))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use inspect.cleandoc here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tip! I didn't know this function. Thank you very much, I will probably add a commit either tonight or during the weekend!

# 2.2
dynamic: Collection[NormalizedDynamicFields] = ()

def __post_init__(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While "accept loose, output strict" is a good idea in most cases, I don't really see what situations we'd have a user of this API throw an unstructured dictionary at this API?

The only case that I think we'd want to accept a dictionary in, would be if we added PEP 621 ingestion (which, I'll keep the discussion for in #383 since that's a "scope" question, rather than an implementation question).

Every other user should be able to just pass the right arguments to construct this object.

@pradyunsg
Copy link
Member

See also #332, which was an earlier half-complete attempt at adding metadata validation.

@abravalheri
Copy link
Contributor Author

Thank you very much @brettcannon and @pradyunsg for the latest reviews.

In my latest commit I have removed the __post_init__ method (redistributing some of its code, since it is still being used by the from_pkg_info) as recommended in the reviews.

Since __post_init__ no longer exist, the attributes are being validated only when parsing from or serialising to files. (This implies that any intermediary representation can be inconsistent). Does it still make sense to keep the dataclass immutable? (The main aspect of the immutability if I remember well was to ensure dynamic was consistent).

For the time being I am leaving frozen=True, but please let me know if you want me to remove that.
As I previously mentioned in the threads of this PR, there are still some nice side effects of having an immutable data structure (specially if we also manage to have Requirements being comparable/hashable), and replace alleviates most of the needs for directly mutating data.

Please let me know if you want to collapse {from/to}_pkg_info and {from/to}_dist_info_metadata in a single method with the allow_unfilled_dynamic keyword argument.


@pradyunsg I had a look on #332, and I think most of the things implemented there are also addressed here. Sorry for failing to see this open PR.

There are a few elements though that I am currently omitting on purpose:

  • This implementation doesn't handle individually old versions of the spec. Instead, it is very permissive with the input, accepting any version, but when serialising, the output always respects the latest version of the spec.
  • There might be some validations that are not implemented. My main focus was to address Add a metadata API #383, instead of Add metadata validation #147. Hopefully just by ingesting metadata files according to the spec and converting the attributes to the correct classes (e.g. Requirements, SpecifierSet, Version, ...) we get most of these validations as a side-effect. The only validations I am explicitly adding regard dynamic.
    • If you have any suggestion of a missing validation, I can give it a try (but maybe it would be better to do it in a separated PR?)

@pradyunsg
Copy link
Member

Sorry for failing to see this open PR.

No need to apologise. I mentioned that PR just to make sure that you're aware that it exists, and so that you could check that there was something there that you could reuse. I apologise for not being clear enough about that. :)

There might be some validations that are not implemented.

That's perfectly fine, we can defer that for a follow up. I'll take a look at this closer to the weekend -- if I don't, @-mention me sometime next week! :)

@brettcannon
Copy link
Member

Does it still make sense to keep the dataclass immutable? (The main aspect of the immutability if I remember well was to ensure dynamic was consistent).

It actually might not. When I was worrying about dynamic that led to your immutability suggestion, it was worrying about the transition from pyproject.toml➡️PKG-INFO➡️METADATA. But since the common case of what fields are filled are the PKG-INFO➡️METADATA and the spec says dynamic is fine as a record of what got filled in METADATA, it might not be important enough to care and thus not worth making a dataclass immutable.

@abravalheri
Copy link
Contributor Author

the spec says dynamic is fine as a record of what got filled in METADATA

Oh right! I should have read the spec more carefully. In this PR the dynamic validation I implemented ended up reflecting the one specified in PEP 621, which is clearly different from the one in PEP 643 (i.e. the Dyamic header can be left behind after the value is backfilled).

I hope to fix this validation later this week.

@brettcannon
Copy link
Member

Oh right! I should have read the spec more carefully.

I didn't even know that detail and I helped introduce dynamic. 😅

As discussed in pypa#383 instead of having 2 separated sets of methods
(one for `PKG-INFO` files in sdists and one for `METADATA` files in wheels)
we can have a single pair to/from functions with an
`allow_unfilled_dynamic` keyword argument.
@abravalheri
Copy link
Contributor Author

I think I managed to fix the dynamic validation now.

I also went ahead and made 2 changes in the last 2 commits based on the previous discussions:

  • 1ed4123: Removed 'frozen' dataclass behaviour (I also converted any usage of frozenset to set)
  • 1ca3573: Implemented the suggestion of using a allow_unfilled_suggestion keyword argument and unified {to/from}_pkg_info and {to/from}_dist_info_metadata into a single pair of methods. (For the time being the {to/from}_pkg_info name is used, but that can be subject to bikeshed).

If allow_unfilled_suggestion is not the direction we want to go, please let me know and I will "revert" the latest commit.

B = TypeVar("B")

if sys.version_info[:2] >= (3, 8) and TYPE_CHECKING: # pragma: no cover
from typing import Literal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than have things type check differently on different Python versions, you can just use:

if TYPE_CHECKING:
    from typing_extensions import Literal

You don't need to actually install typing_extensions for this to work, since a) it's in the TYPE_CHECKING block, b) type checkers treat typing_extensions as part of the stdlib, so they always know about its existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hauntsaninja, thank you for the tip! That sounds very nice... However when I try to run mypy with Python 3.7, I get a series of errors, including:

packaging/metadata.py:41: error: Module "typing" has no attribute "Literal"; maybe "_Literal"?

Are you sure about type checkers treating typing_extensions as part of the stdlib?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm quite sure (I help maintain typeshed and mypy). That error looks like you might not have changed the import to use typing_extensions instead of typing?

The following works for me (where emptyenv does not have typing_extensions installed in it):

python3.7 -m mypy -c 'from typing_extensions import Literal' --python-version 3.7 --python-executable emptyenv/bin/python

To be explicit, the options are:
1)

# does not require runtime dependency on typing_extensions
# works because all type checkers special case typing_extensions
if TYPE_CHECKING:
    from typing_extensions import Literal
    NormalizedDynamicFields = Literal[...]
else:
    NormalizedDynamicFields = str
from typing_extensions import Literal
# does require runtime dependency on typing_extensions
NormalizedDynamicFields = Literal[...]

Anyway, the only effect of the PR as it stands would be slightly looser type checking for users of packaging who use Python 3.7, so this is mostly a nit :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I completely misinterpreted your comment and then, when reading your example, my confirmation bias took over and I was able to swear you were still using from typing import Literal... Sorry for that 😅

That sounds a very good approach to me, I will go ahead and implement it.

If the repository owners prefer keep the previous approach, please let me know and I will "revert" the commit.

As indicated in the [code review](pypa#498 (comment))
type checkers consider `typing_extensions` as part of the stdlib, so it
does not need to be explicitly installed.
"1.1": {
"has_dynamic_fields": False,
"is_final_metadata": True,
"file_contents": """\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to embed the file contents, or should we have them as files in the test suite that we load as appropriate?

@brettcannon
Copy link
Member

Now that c533201 has landed, I'm closing this PR.

@abravalheri thanks for taking a chance on writing something!

@abravalheri
Copy link
Contributor Author

Thank you very much @brettcannon, looking forward for the email-message ingestion/production message (the plan is to migrate setuptools metadata in the mid/long term).

@brettcannon
Copy link
Member

looking forward for the email-message ingestion/production message

So am I. 😉 That's my next planned step (unless someone beats me to it).

the plan is to migrate setuptools metadata in the mid/long term

No pressure. 😅

@abravalheri abravalheri mentioned this pull request Jul 11, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a metadata API
4 participants