-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform a more thorough check of package_data structure #1769
Conversation
Any advice on what test I should write and where I should put it would be appreciated. |
As of a86588f, the two messages are:
|
setuptools/dist.py
Outdated
@@ -314,12 +318,14 @@ def check_package_data(dist, attr, value): | |||
iter(v) | |||
except TypeError: | |||
break | |||
if isinstance(v, str): | |||
distutils.log.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've generally been doing warnings.warn
instead of distutils.log.warn
these days.
Thanks @pganssle for the suggestions. As far as the
I've implemented 1 because 3 & 4 could cause |
f7d9412
to
b8904e2
Compare
As of b8904e2, Python 2 tests fail when a unicode str is passed as a dictionary key. This became apparent because package_data = {
u'hello': ['*.msg'],
} Therefore, I will update check_package_data to be more lenient. But please let me know if Python 2 should only accept |
@dhimmel Python 2 should definitely accept unicode strings for keys. |
Would re-using |
Thanks for the the suggestion. In 93af6a5, I update the docstring for The code for
If we are to stick by those guidelines, we can add |
That's OK, yes.
IMHO, printing a warning instead of erroring out does not make sense if a keyword value is not valid. Passing a string instead of a list of string for |
Okay in 0435d7a I switch to raising an error for any invalid values.
Note that the old behavior may not always result in missing package data, because on Linux the path
I'm not sure that all iterables should be considered lists for the purposes of |
@pganssle @benoit-pierre ready for re-review. Also advice appreciated on whether this PR should address the potential issue with consumable iterables described above. |
@pganssle @benoit-pierre ready for re-review. |
Sorry for the delay. I think it would be more useful if the exception message in case of invalid key showed the key value. Was it you intent to check the exception message in the tests, or to provide a custom message when the check fails (see https://docs.pytest.org/en/latest/deprecations.html#raises-message-deprecated)? IMHO the former is better, and the tests can be parameterized (with added check for the "not a dictionary" case): setuptools/dist.py | 12 ++++-----
setuptools/tests/test_dist.py | 58 ++++++++++++++++++++++++-------------------
2 files changed, 38 insertions(+), 32 deletions(-)
diff --git i/setuptools/dist.py w/setuptools/dist.py
index 4e9190b3..8e3e1f92 100644
--- i/setuptools/dist.py
+++ w/setuptools/dist.py
@@ -306,15 +306,15 @@ def check_test_suite(dist, attr, value):
def check_package_data(dist, attr, value):
"""Verify that value is a dictionary of package names to glob lists"""
- msg = (
- "{!r} must be a dictionary mapping package names to lists of "
- "string wildcard patterns".format(attr)
- )
if not isinstance(value, dict):
- raise DistutilsSetupError(msg)
+ raise DistutilsSetupError(
+ "{!r} must be a dictionary mapping package names to lists of "
+ "string wildcard patterns".format(attr))
for k, v in value.items():
if not isinstance(k, six.string_types):
- raise DistutilsSetupError(msg)
+ raise DistutilsSetupError(
+ "{!r} must be strings (got {!r})".format(
+ 'keys of {!r} dict'.format(attr), k))
assert_string_list(dist, 'values of {!r} dict'.format(attr), v)
diff --git i/setuptools/tests/test_dist.py w/setuptools/tests/test_dist.py
index 091b3687..38cc1039 100644
--- i/setuptools/tests/test_dist.py
+++ w/setuptools/tests/test_dist.py
@@ -3,6 +3,7 @@
from __future__ import unicode_literals
import io
+import re
from distutils.errors import DistutilsSetupError
from setuptools.dist import (
_get_unpatched,
@@ -270,35 +271,40 @@ def test_maintainer_author(name, attrs, tmpdir):
assert line in pkg_lines_set
-@pytest.mark.filterwarnings('error')
-def test_check_package_data_okay():
- package_data = {
+CHECK_PACKAGE_DATA_TESTS = (
+ # Valid.
+ ({
'': ['*.txt', '*.rst'],
'hello': ['*.msg'],
- }
- assert check_package_data(None, 'package_data', package_data) is None
-
-
-def test_check_package_data_invalid_key():
- package_data = {
+ }, None),
+ # Not a dictionary.
+ ((
+ ('', ['*.txt', '*.rst']),
+ ('hello', ['*.msg']),
+ ), (
+ "'package_data' must be a dictionary mapping package"
+ " names to lists of string wildcard patterns"
+ )),
+ # Invalid key type.
+ ({
400: ['*.txt', '*.rst'],
- }
- expected_message = (
- "'package_data' must be a dictionary mapping package names to lists "
- "of string wildcard patterns"
- )
- with pytest.raises(DistutilsSetupError, message=expected_message):
- check_package_data(None, 'package_data', package_data)
-
-
-def test_check_package_data_invalid_value():
- """error if str values: https://github.com/pypa/setuptools/issues/1459"""
- package_data = {
+ }, (
+ "\"keys of 'package_data' dict\" "
+ "must be strings (got 400)"
+ )),
+ # Invalid value type.
+ ({
'hello': '*.msg',
- }
- expected_message = (
+ }, (
"\"values of 'package_data' dict\" "
"must be a list of strings (got '*.msg')"
- )
- with pytest.raises(DistutilsSetupError, message=expected_message):
- check_package_data(None, 'package_data', package_data)
+ )),
+)
+
+@pytest.mark.parametrize('package_data, expected_message', CHECK_PACKAGE_DATA_TESTS)
+def test_check_package_data(package_data, expected_message):
+ if expected_message is None:
+ assert check_package_data(None, 'package_data', package_data) is None
+ else:
+ with pytest.raises(DistutilsSetupError, match=re.escape(expected_message)):
+ check_package_data(None, 'package_data', package_data) |
As for the problem with generators, there's only one option: raise an error if a generator is passed, as we can't change the API of the |
Thanks @benoit-pierre for the suggested patch. I applied it in d6fb012. You were correct that I misunderstood the "message" parameter of
I will now go through everything again and think about whether it makes sense to address the generator issue in this PR. |
"{!r} must be strings (got {!r})".format( | ||
'keys of {!r} dict'.format(attr), k)) | ||
"keys of the {!r} dict must be strings (got {!r})" | ||
.format(attr, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we might as well avoid the awkward nesting quoting when possible, such that this returns keys of 'package_data' dict must be strings (got 400)
rather than "keys of 'package_data' dict" must be strings (got 400)
.
903cfd2
to
993bbd6
Compare
setuptools/dist.py
Outdated
try: | ||
# verify that value is not a single-use iterable | ||
assert list(value) == list(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benoit-pierre does this seem like the right implementation to check whether value is consumable?
For my records, this was a helpful reference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using isinstance
in this case is better than consuming the list and checking if it's consumed for sure.
I haven't had time to look at the rest of this PR and see where this assert_string_list
is actually used - is there any chance that people would have legitimate reasons for passing a set
to something that calls this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, assert_string_list
is called by dist.check_nsp
, dist.check_package_data
(added by this PR), and directly for some of the following:
Lines 92 to 114 in 3adda96
"distutils.setup_keywords": [ | |
"eager_resources = setuptools.dist:assert_string_list", | |
"namespace_packages = setuptools.dist:check_nsp", | |
"extras_require = setuptools.dist:check_extras", | |
"install_requires = setuptools.dist:check_requirements", | |
"tests_require = setuptools.dist:check_requirements", | |
"setup_requires = setuptools.dist:check_requirements", | |
"python_requires = setuptools.dist:check_specifier", | |
"entry_points = setuptools.dist:check_entry_points", | |
"test_suite = setuptools.dist:check_test_suite", | |
"zip_safe = setuptools.dist:assert_bool", | |
"package_data = setuptools.dist:check_package_data", | |
"exclude_package_data = setuptools.dist:check_package_data", | |
"include_package_data = setuptools.dist:assert_bool", | |
"packages = setuptools.dist:check_packages", | |
"dependency_links = setuptools.dist:assert_string_list", | |
"test_loader = setuptools.dist:check_importable", | |
"test_runner = setuptools.dist:check_importable", | |
"use_2to3 = setuptools.dist:assert_bool", | |
"convert_2to3_doctests = setuptools.dist:assert_string_list", | |
"use_2to3_fixers = setuptools.dist:assert_string_list", | |
"use_2to3_exclude_fixers = setuptools.dist:assert_string_list", | |
], |
Lines 225 to 228 in 49a839f
def check_nsp(dist, attr, value): | |
"""Verify that namespace packages are valid""" | |
ns_packages = value | |
assert_string_list(dist, attr, ns_packages) |
So I think the total list of setup keywords that end up hitting assert_string_list
is:
eager_resources
namespace_packages
package_data
exclude_package_data
dependency_links
convert_2to3_doctests
use_2to3_fixers
use_2to3_exclude_fixers
Is the behavior of any of these setup keywords order dependent? If so, we probably want to not allow sets of strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just simplify the whole thing and just explicitly check for a list/tuple with
isinstance(value, (tuple, list))
Implemented in d9035ef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @benoit-pierre and @pganssle
As per discussion in pypa#1769 (comment)
value=None raises TypeError DistutilsSetupError: 2 must be a list of strings (got None)
d9035ef
to
8f848bd
Compare
Ensure the dictionary values are lists/tuples of strings. Fix pypa#1459.
While I welcome better checks for package_data, this was a breaking change because packages that were relying on the previous behavior and were installing fine with 3.7.3 are now unable to be installed with python 3.7.4 and I had to run around trying to figure out why all of our CI pipelines were suddenly failing. Fortunately I was able to quickly patch the offending upstream project, but others may not be so fortunate. This change should not have been introduced into a patch release. |
I don't think this is anything to do with 3.7.3 vs 3.7.4, My understanding is that this is backwards compatible because this only raises an error on already broken workflows - i.e. it may have been silently failing and now it's loudly failing. |
I'm trying to understand what broke then, because I had several CI jobs that relied on the 'python:3.7 image' that have been working for months, then suddenly they started failing last week once the python images updated to pip 19.2.2. With the updated docker images, my python repo with the incorrect package_data definition wouldn't install anymore but it would work again if I reverted back to the python 3.7.3 image. What determines the version of setuptools that pip uses. Is it bundled? If so, I can't see a reference anywhere to the v41.1 release. |
@dquist my guess is that the different docker images have different versions of I do agree that this change is not entirely backwards compatible. Before, packages with a broken I wonder if one solution would be an option like |
Summary of changes
package_data
andexclude_package_data
expect to receive a dictionary whose values are lists rather than strings. Previously, passing a str value did not cause any warning or error, which is dangerous because users accidentally input a glob/path that then gets iterated into individual characters.Refs #1459
Pull Request Checklist