Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add metadata validation #1562

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
74 changes: 49 additions & 25 deletions setuptools/dist.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from setuptools import windows_support
from setuptools.monkey import get_unpatched
from setuptools.config import parse_configuration
from setuptools.validation import Metadata
import pkg_resources
from .py36compat import Distribution_parse_config_files

Expand Down Expand Up @@ -55,17 +56,41 @@ def get_metadata_version(dist_md):
def write_pkg_file(self, file):
"""Write the PKG-INFO format data to a file object.
"""
metadata = Metadata(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you wrapping DistributionMetadata in a Metadata object? Even if there were not other problems with this approach, I would think that the validator could work just as well on DistributionMetadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You're right.

On the other hand, consider the validation functionality suggested by @di: it requires that you pass a dictionary to the validation class constructor. How would you solve this with the DistributionMetadata constructor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @di was referring to more generic functionality for a package other than setuptools, so it depends on the details there. One option is to just generate a non-validated PKG-INFO and pass it to a function that validates PKG-INFO files.

Another is to have an intermediate format that represents PKG-INFO as JSON (thus retaining some type information and avoiding the problem where content and format are indistinguishable) which is convertable by packaging into a PKG-INFO file. packaging could then validate the JSON file before a PKG-INFOfile is generated.

If we're already trying to validate a DistributionMetadata file, then I would just validate the results of the actual getter operations.

name=self.get_name(),
version=self.get_version(),
description=self.get_description(),
url=self.get_url(),
author=self.get_contact(),
author_email=self.get_contact_email(),
license=self.get_license(),
download_url=self.download_url,
project_urls=['%s, %s' % (key, val)
for key, val in self.project_urls.items()],
long_description=rfc822_escape(self.get_long_description()),
long_description_content_type=self.long_description_content_type,
keywords=self.get_keywords(),
platforms=self.get_platforms(),
classifiers=self.get_classifiers(),
requires=self.get_requires(),
provides=self.get_provides(),
obsoletes=self.get_obsoletes(),
python_requires=getattr(self, 'python_requires', None),
provides_extras=self.provides_extras,
)
metadata.validate(throw_exception=True)

version = get_metadata_version(self)

file.write('Metadata-Version: %s\n' % version)
file.write('Name: %s\n' % self.get_name())
file.write('Version: %s\n' % self.get_version())
file.write('Summary: %s\n' % self.get_description())
file.write('Home-page: %s\n' % self.get_url())
file.write('Name: %s\n' % metadata.name)
file.write('Version: %s\n' % metadata.version)
file.write('Summary: %s\n' % metadata.description)
file.write('Home-page: %s\n' % metadata.url)

if version < StrictVersion('1.2'):
file.write('Author: %s\n' % self.get_contact())
file.write('Author-email: %s\n' % self.get_contact_email())
file.write('Author: %s\n' % metadata.author)
file.write('Author-email: %s\n' % metadata.author_email)
else:
optional_fields = (
('Author', 'author'),
Expand All @@ -75,51 +100,50 @@ def write_pkg_file(self, file):
)

for field, attr in optional_fields:
attr_val = getattr(self, attr)
attr_val = getattr(metadata, attr, None)
if six.PY2:
attr_val = self._encode_field(attr_val)

if attr_val is not None:
file.write('%s: %s\n' % (field, attr_val))

file.write('License: %s\n' % self.get_license())
file.write('License: %s\n' % metadata.license)
if self.download_url:
file.write('Download-URL: %s\n' % self.download_url)
for project_url in self.project_urls.items():
file.write('Project-URL: %s, %s\n' % project_url)
file.write('Download-URL: %s\n' % metadata.download_url)
for project_named_url in metadata.project_urls:
file.write('Project-URL: %s\n' % project_named_url)

long_desc = rfc822_escape(self.get_long_description())
file.write('Description: %s\n' % long_desc)
file.write('Description: %s\n' % metadata.long_description)

keywords = ','.join(self.get_keywords())
keywords = ','.join(metadata.keywords)
if keywords:
file.write('Keywords: %s\n' % keywords)

if version >= StrictVersion('1.2'):
for platform in self.get_platforms():
for platform in metadata.platforms:
file.write('Platform: %s\n' % platform)
else:
self._write_list(file, 'Platform', self.get_platforms())
self._write_list(file, 'Platform', metadata.platforms)

self._write_list(file, 'Classifier', self.get_classifiers())
self._write_list(file, 'Classifier', metadata.classifiers)

# PEP 314
self._write_list(file, 'Requires', self.get_requires())
self._write_list(file, 'Provides', self.get_provides())
self._write_list(file, 'Obsoletes', self.get_obsoletes())
self._write_list(file, 'Requires', metadata.requires)
self._write_list(file, 'Provides', metadata.provides)
self._write_list(file, 'Obsoletes', metadata.obsoletes)

# Setuptools specific for PEP 345
if hasattr(self, 'python_requires'):
file.write('Requires-Python: %s\n' % self.python_requires)
if metadata.python_requires:
file.write('Requires-Python: %s\n' % metadata.python_requires)

# PEP 566
if self.long_description_content_type:
if metadata.long_description_content_type:
file.write(
'Description-Content-Type: %s\n' %
self.long_description_content_type
metadata.long_description_content_type
)
if self.provides_extras:
for extra in self.provides_extras:
for extra in metadata.provides_extras:
file.write('Provides-Extra: %s\n' % extra)


Expand Down
106 changes: 106 additions & 0 deletions setuptools/validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
"""
Validation logic for package metadata, supporting PEP 566.
See: https://www.python.org/dev/peps/pep-0566/
"""


def single_line_str(val):
"""A string that does not contain newline characters"""
if val is None:
return
assert isinstance(val, str), \
"is of type %s but should be of type str" % type(val)
assert '\n' not in val, \
"to contain a newline character (not allowed)"


def multi_line_str(val):
"""A string that can contain newline characters"""
if val is None:
return
assert isinstance(val, str), \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use assert for control flow. It is disabled when python is run optimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be similarly elegant? I don't want to do an if foo: raise Bar() in all that many places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is not intended to be an "elegant" shorthand for raising an exception, it's a way of programmatically making assertions about how your code should work. You use it for things like enforcing a function's contract or documenting assumptions made by the programmer (which are then validated at runtime).

In this specific case, you cannot use it because it will not work. Assertions are to be treated as optional statements and if you run Python in optimized mode, they will simply not run at all, for example:

$ python -Oc "assert False"
$ python -c "assert False"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AssertionError

In this particular case, I would either raise ValueError or convert validators to functions returning a boolean, depending on the details. I'm not sure it's worth working out the details in this case, though, because as mentioned elsewhere this functionality rightly belongs in another library.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bittner Please update the code and use raise exception. Assert should be used only inside tests and I am sure that some linters are even able to identify accidental use outside tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssbarnea I think at this point this PR needs a much bigger overhaul, and most of it may need to live in this repo. Probably best to hold off on cleaning up this specific code until the overhaul is done.

"is of type %s but should be of type str" % type(val)


def list_of_str(val):
"""A list of single-line text strings"""
if val is None:
return
assert isinstance(val, list), \
"is of type %s but should be of type list(str)" % type(val)
for item in val:
single_line_str(item)


def set_of_str(val):
"""A set (i.e. a list of unique entries) of single-line text strings"""
if val is None:
return
assert isinstance(val, set), \
"is of type %s but should be of type set(str)" % type(val)
for item in val:
single_line_str(item)


default_validator = single_line_str
validators = dict(
long_description=multi_line_str,
classifiers=list_of_str,
keywords=list_of_str,
platforms=list_of_str,
project_urls=list_of_str,
requires=list_of_str,
provides=list_of_str,
provides_extras=set_of_str,
obsoletes=list_of_str,
)


class Metadata:
"""
A validation object for PKG-INFO data fields
"""

def __init__(self, *data_dicts, **kwargs):
"""
Takes JSON-compatible metadata, as described in PEP 566, which
can be added as both dictionaries or keyword arguments.
"""
self.errors = []
self.fields = []

for data in data_dicts:
for key in data:
setattr(self, key, data[key])
self.fields += [key]

for key in kwargs:
setattr(self, key, kwargs[key])
self.fields += [key]

def validate(self, throw_exception=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for throw_exception, if people want to suppress the exceptions from this function, they can do:

try:
    metadata.validate()
    invalid = False
except InvalidMetadataError:
    invalid = True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that @di suggested to add validation functionality that behaves like validate().errors, so that it integrates nicely into Warehouse code.

I felt that throwing an exception by default would make code more clumsy. It's just not beautiful. -- I wouldn't want to miss that possibility in general, though.

Copy link
Member

@pganssle pganssle Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that @di suggested to add validation functionality that behaves like validate().errors, so that it integrates nicely into Warehouse code.

He suggested no such thing. He suggested that something like the warehouse functionality go into a different project, pypa/packaging, not this one. I think you may have misunderstood him.

I felt that throwing an exception by default would make code more clumsy. It's just not beautiful. -- I wouldn't want to miss that possibility in general, though.

This is in general not an example of good design, because there are now two different error handling pathways. Additionally, the Pythonic way to signal an error condition is to raise an exception. It is not clumsy and in fact it leads to a lot of clumsy code in downstream consumers.

The reason is that in most cases, the functions calling your function can do nothing about an error except abort early and notify the user, if you raise an exception in error conditions, anyone who isn't going to try to recover from the error can just use your function as if it always works because the exception you raise will bubble up the stack until someone either catches it or the program is aborted and the user is notified of the error. If you return an error code instead, you are relying on your consumers to know that your function can either return an error code of some sort or not and raise an error or a handler.

"""
Runs validation over all data, and sets the ``error`` attribute
in case of validation errors.
"""
self.errors = []

for field in self.fields:
validator = validators.get(field, default_validator)
val = getattr(self, field)
try:
validator(val)
except AssertionError as err:
self.errors += [(field, err)]

if throw_exception and self.errors:
raise InvalidMetadataError(self.errors)

return not self.errors


class InvalidMetadataError(ValueError):
"""An error that can be raised when validation fails"""

def __init__(self, error_list):
self.errors = error_list