Perform a more thorough check of package_data structure #1769

dhimmel · 2019-05-22T21:51:38Z

Summary of changes

package_data and exclude_package_data expect to receive a dictionary whose values are lists rather than strings. Previously, passing a str value did not cause any warning or error, which is dangerous because users accidentally input a glob/path that then gets iterated into individual characters.

Refs #1459

Pull Request Checklist

Changes have tests
News fragment added in changelog.d. See documentation for details

dhimmel · 2019-05-22T22:05:37Z

Any advice on what test I should write and where I should put it would be appreciated.

dhimmel · 2019-05-22T22:18:14Z

As of a86588f, the two messages are:

DistutilsSetupError: 'package_data' must be a dictionary mapping package names to lists of wildcard patterns

WARNING: 'package_data' must be a dictionary mapping package names to lists of wildcard patterns, but the value for 'manubot' is a str not list

pganssle · 2019-05-22T23:50:54Z

setuptools/dist.py

@@ -314,12 +318,14 @@ def check_package_data(dist, attr, value):
                iter(v)
            except TypeError:
                break
+            if isinstance(v, str):
+                distutils.log.warn(


I think we've generally been doing warnings.warn instead of distutils.log.warn these days.

pganssle · 2019-05-22T23:53:36Z

@dhimmel Thank you for for looking into this. I think you'll want to add tests into test_dist. If you switch over to use a warning instead of distutils.log.warn, you can use the pytest.raises and pytest.warns context managers to test both of these scenarios.

dhimmel · 2019-05-22T23:59:29Z

Thanks @pganssle for the suggestions.

As far as the check_package_data implementation goes, I considered a few options:

warn if dictionary value is string
warn if dictionary value is not list
error if dictionary value is string
error if dictionary value is not list

I've implemented 1 because 3 & 4 could cause setup.py files that currently work to begin failing. Does anyone have a preference between 1 or 2. Note that 1 and 2 is also an option, where the warning message would note what type of value was received and that it wasn't a list.

dhimmel · 2019-05-23T17:57:44Z

As of b8904e2, Python 2 tests fail when a unicode str is passed as a dictionary key. This became apparent because test_dist.py invokes from __future__ import unicode_literals. I am guessing we do not want check_package_data data to fail if a dictionary like the following is provided:

    package_data = {
        u'hello': ['*.msg'],
    }

Therefore, I will update check_package_data to be more lenient. But please let me know if Python 2 should only accept str and not unicode keys?

pganssle · 2019-05-23T18:03:14Z

@dhimmel Python 2 should definitely accept unicode strings for keys.

benoit-pierre · 2019-05-23T18:03:24Z

Would re-using assert_string_list make more sense? Have you checked the implementation for check_extras?

dhimmel · 2019-05-23T18:44:19Z

Would re-using assert_string_list make more sense?

Thanks for the the suggestion. In 93af6a5, I update the docstring for assert_string_list because it seems to raise an error when value=None. One issue I have with assert_string_list is that the implementation doesn't actually test for a list of strings. For example, it would pass with a tuple or generator of strings. Is that intended or not... IDK?

The code for check_package_data could be simplified by outsourcing to a function like assert_string_list, However my worry is about backwards compatibility. Currently, setup.py files that use str (rather than list of str) values for package_data will not fail installation and the package may function properly, despite the unintended interpretation of the the package_data configuration. Because of this, I assumed we'd want to take the following approach:

keep all existing checks that raise errors, i.e. don't increase the ability to accept bad input
issue warnings for any input that doesn't conform to the documentation, but that was previously accepted

If we are to stick by those guidelines, we can add assert_string_list to check that the dict values are lists of strings, but we'd have to except the DistutilsSetupError from assert_string_list and convert it to a warning. So yes, we could add an extra check, but this wouldn't shorten/simplify check_package_data. Or @benoit-pierre do you think backwards incompatibility is okay, where specs that used to pass now error?

benoit-pierre · 2019-05-23T22:10:25Z

Would re-using assert_string_list make more sense?

Thanks for the the suggestion. In 93af6a5, I update the docstring for assert_string_list because it seems to raise an error when value=None. One issue I have with assert_string_list is that the implementation doesn't actually test for a list of strings. For example, it would pass with a tuple or generator of strings. Is that intended or not... IDK?

That's OK, yes.

The code for check_package_data could be simplified by outsourcing to a function like assert_string_list, However my worry is about backwards compatibility. Currently, setup.py files that use str (rather than list of str) values for package_data will not fail installation and the package may function properly, despite the unintended interpretation of the the package_data configuration. Because of this, I assumed we'd want to take the following approach:
* keep all existing checks that raise errors, i.e. don't increase the ability to accept bad input

* issue warnings for any input that doesn't conform to the documentation, but that was previously accepted
If we are to stick by those guidelines, we can add assert_string_list to check that the dict values are lists of strings, but we'd have to except the DistutilsSetupError from assert_string_list and convert it to a warning. So yes, we could add an extra check, but this wouldn't shorten/simplify check_package_data. Or @benoit-pierre do you think backwards incompatibility is okay, where specs that used to pass now error?

IMHO, printing a warning instead of erroring out does not make sense if a keyword value is not valid. Passing a string instead of a list of string for package_data as always been invalid, resulting in missing package data, so I don't see it as a backward incompatible change to enforce it.

dhimmel · 2019-05-24T14:31:06Z

IMHO, printing a warning instead of erroring out does not make sense if a keyword value is not valid

Okay in 0435d7a I switch to raising an error for any invalid values.

Passing a string instead of a list of string for package_data as always been invalid, resulting in missing package data, so I don't see it as a backward incompatible change to enforce it.

Note that the old behavior may not always result in missing package data, because on Linux the path / could include everything? Furthermore, missing package data usually makes it so some specific functionality breaks. However, this PR will now make it so the entire installation breaks. Not saying I'm against raising an error, just that I want the risks to be clear.

That's OK, yes.

I'm not sure that all iterables should be considered lists for the purposes of assert_string_list. For example, if the value is a generator that gets consumed during these initial checks, then that is a problem?

dhimmel · 2019-05-29T19:56:48Z

@pganssle @benoit-pierre ready for re-review. Also advice appreciated on whether this PR should address the potential issue with consumable iterables described above.

dhimmel · 2019-06-24T14:31:56Z

@pganssle @benoit-pierre ready for re-review.

benoit-pierre · 2019-06-25T14:52:02Z

Sorry for the delay.

I think it would be more useful if the exception message in case of invalid key showed the key value.

Was it you intent to check the exception message in the tests, or to provide a custom message when the check fails (see https://docs.pytest.org/en/latest/deprecations.html#raises-message-deprecated)? IMHO the former is better, and the tests can be parameterized (with added check for the "not a dictionary" case):

 setuptools/dist.py            | 12 ++++-----
 setuptools/tests/test_dist.py | 58 ++++++++++++++++++++++++-------------------
 2 files changed, 38 insertions(+), 32 deletions(-)

diff --git i/setuptools/dist.py w/setuptools/dist.py
index 4e9190b3..8e3e1f92 100644
--- i/setuptools/dist.py
+++ w/setuptools/dist.py
@@ -306,15 +306,15 @@ def check_test_suite(dist, attr, value):
 
 def check_package_data(dist, attr, value):
     """Verify that value is a dictionary of package names to glob lists"""
-    msg = (
-        "{!r} must be a dictionary mapping package names to lists of "
-        "string wildcard patterns".format(attr)
-    )
     if not isinstance(value, dict):
-        raise DistutilsSetupError(msg)
+        raise DistutilsSetupError(
+            "{!r} must be a dictionary mapping package names to lists of "
+            "string wildcard patterns".format(attr))
     for k, v in value.items():
         if not isinstance(k, six.string_types):
-            raise DistutilsSetupError(msg)
+            raise DistutilsSetupError(
+                "{!r} must be strings (got {!r})".format(
+                    'keys of {!r} dict'.format(attr), k))
         assert_string_list(dist, 'values of {!r} dict'.format(attr), v)
 
 
diff --git i/setuptools/tests/test_dist.py w/setuptools/tests/test_dist.py
index 091b3687..38cc1039 100644
--- i/setuptools/tests/test_dist.py
+++ w/setuptools/tests/test_dist.py
@@ -3,6 +3,7 @@
 from __future__ import unicode_literals
 
 import io
+import re
 from distutils.errors import DistutilsSetupError
 from setuptools.dist import (
     _get_unpatched,
@@ -270,35 +271,40 @@ def test_maintainer_author(name, attrs, tmpdir):
             assert line in pkg_lines_set
 
 
-@pytest.mark.filterwarnings('error')
-def test_check_package_data_okay():
-    package_data = {
+CHECK_PACKAGE_DATA_TESTS = (
+    # Valid.
+    ({
         '': ['*.txt', '*.rst'],
         'hello': ['*.msg'],
-    }
-    assert check_package_data(None, 'package_data', package_data) is None
-
-
-def test_check_package_data_invalid_key():
-    package_data = {
+    }, None),
+    # Not a dictionary.
+    ((
+        ('', ['*.txt', '*.rst']),
+        ('hello', ['*.msg']),
+    ), (
+        "'package_data' must be a dictionary mapping package"
+        " names to lists of string wildcard patterns"
+    )),
+    # Invalid key type.
+    ({
         400: ['*.txt', '*.rst'],
-    }
-    expected_message = (
-        "'package_data' must be a dictionary mapping package names to lists "
-        "of string wildcard patterns"
-    )
-    with pytest.raises(DistutilsSetupError, message=expected_message):
-        check_package_data(None, 'package_data', package_data)
-
-
-def test_check_package_data_invalid_value():
-    """error if str values: https://github.com/pypa/setuptools/issues/1459"""
-    package_data = {
+    }, (
+        "\"keys of 'package_data' dict\" "
+        "must be strings (got 400)"
+    )),
+    # Invalid value type.
+    ({
         'hello': '*.msg',
-    }
-    expected_message = (
+    }, (
         "\"values of 'package_data' dict\" "
         "must be a list of strings (got '*.msg')"
-    )
-    with pytest.raises(DistutilsSetupError, message=expected_message):
-        check_package_data(None, 'package_data', package_data)
+    )),
+)
+
+@pytest.mark.parametrize('package_data, expected_message', CHECK_PACKAGE_DATA_TESTS)
+def test_check_package_data(package_data, expected_message):
+    if expected_message is None:
+        assert check_package_data(None, 'package_data', package_data) is None
+    else:
+        with pytest.raises(DistutilsSetupError, match=re.escape(expected_message)):
+            check_package_data(None, 'package_data', package_data)

benoit-pierre · 2019-06-25T14:57:32Z

As for the problem with generators, there's only one option: raise an error if a generator is passed, as we can't change the API of the check_xxx functions (distutils territory) to validate and return the validated value (returning the result of consuming it if a generator is passed).

pypa#1769 (comment)

dhimmel · 2019-06-26T16:01:13Z

Thanks @benoit-pierre for the suggested patch. I applied it in d6fb012.

You were correct that I misunderstood the "message" parameter of pytest.raises as described in the deprecation notice:

It is a common mistake to think this parameter will match the exception message, while in fact it only serves to provide a custom message in case the pytest.raises check fails.

I will now go through everything again and think about whether it makes sense to address the generator issue in this PR.

dhimmel · 2019-06-26T21:43:53Z

setuptools/dist.py

-                "{!r} must be strings (got {!r})".format(
-                    'keys of {!r} dict'.format(attr), k))
+                "keys of the {!r} dict must be strings (got {!r})"
+                .format(attr, k)


I'm thinking we might as well avoid the awkward nesting quoting when possible, such that this returns keys of 'package_data' dict must be strings (got 400) rather than "keys of 'package_data' dict" must be strings (got 400).

dhimmel · 2019-06-26T22:39:15Z

setuptools/dist.py

    try:
+        # verify that value is not a single-use iterable
+        assert list(value) == list(value)


@benoit-pierre does this seem like the right implementation to check whether value is consumable?

For my records, this was a helpful reference here.

I wonder if we should just simplify the whole thing and just explicitly check for a list/tuple with isinstance(value, (tuple, list)). @jaraco, @pganssle?

I think using isinstance in this case is better than consuming the list and checking if it's consumed for sure.

I haven't had time to look at the rest of this PR and see where this assert_string_list is actually used - is there any chance that people would have legitimate reasons for passing a set to something that calls this?

Currently, assert_string_list is called by dist.check_nsp, dist.check_package_data (added by this PR), and directly for some of the following:

setuptools/setup.py

Lines 92 to 114 in 3adda96

"distutils.setup_keywords": [

"eager_resources = setuptools.dist:assert_string_list",

"namespace_packages = setuptools.dist:check_nsp",

"extras_require = setuptools.dist:check_extras",

"install_requires = setuptools.dist:check_requirements",

"tests_require = setuptools.dist:check_requirements",

"setup_requires = setuptools.dist:check_requirements",

"python_requires = setuptools.dist:check_specifier",

"entry_points = setuptools.dist:check_entry_points",

"test_suite = setuptools.dist:check_test_suite",

"zip_safe = setuptools.dist:assert_bool",

"package_data = setuptools.dist:check_package_data",

"exclude_package_data = setuptools.dist:check_package_data",

"include_package_data = setuptools.dist:assert_bool",

"packages = setuptools.dist:check_packages",

"dependency_links = setuptools.dist:assert_string_list",

"test_loader = setuptools.dist:check_importable",

"test_runner = setuptools.dist:check_importable",

"use_2to3 = setuptools.dist:assert_bool",

"convert_2to3_doctests = setuptools.dist:assert_string_list",

"use_2to3_fixers = setuptools.dist:assert_string_list",

"use_2to3_exclude_fixers = setuptools.dist:assert_string_list",

],

setuptools/setuptools/dist.py

Lines 225 to 228 in 49a839f

def check_nsp(dist, attr, value):

"""Verify that namespace packages are valid"""

ns_packages = value

assert_string_list(dist, attr, ns_packages)

So I think the total list of setup keywords that end up hitting assert_string_list is:

eager_resources

namespace_packages

package_data

exclude_package_data

dependency_links

convert_2to3_doctests

use_2to3_fixers

use_2to3_exclude_fixers

Is the behavior of any of these setup keywords order dependent? If so, we probably want to not allow sets of strings?

I wonder if we should just simplify the whole thing and just explicitly check for a list/tuple with isinstance(value, (tuple, list))

Implemented in d9035ef

Pinging @benoit-pierre and @pganssle

As per discussion in pypa#1769 (comment)

value=None raises TypeError DistutilsSetupError: 2 must be a list of strings (got None)

benoit-pierre · 2019-07-16T11:17:33Z

I have squashed the history and simplified the news fragment. I think this is good to go, but I'll wait a few days before merging in case @pganssle / @jaraco disagree.

Ensure the dictionary values are lists/tuples of strings. Fix pypa#1459.

dquist · 2019-08-14T16:54:11Z

While I welcome better checks for package_data, this was a breaking change because packages that were relying on the previous behavior and were installing fine with 3.7.3 are now unable to be installed with python 3.7.4 and I had to run around trying to figure out why all of our CI pipelines were suddenly failing. Fortunately I was able to quickly patch the offending upstream project, but others may not be so fortunate.

This change should not have been introduced into a patch release.

pganssle · 2019-08-17T07:33:42Z

While I welcome better checks for package_data, this was a breaking change because packages that were relying on the previous behavior and were installing fine with 3.7.3 are now unable to be installed with python 3.7.4 and I had to run around trying to figure out why all of our CI pipelines were suddenly failing.

I don't think this is anything to do with 3.7.3 vs 3.7.4, setuptools is an independent project from CPython. This was released as part of setuptools version 41.1.0, which was a bugfix release, not a patch release.

My understanding is that this is backwards compatible because this only raises an error on already broken workflows - i.e. it may have been silently failing and now it's loudly failing.

dquist · 2019-08-19T16:54:00Z

I'm trying to understand what broke then, because I had several CI jobs that relied on the 'python:3.7 image' that have been working for months, then suddenly they started failing last week once the python images updated to pip 19.2.2.

With the updated docker images, my python repo with the incorrect package_data definition wouldn't install anymore but it would work again if I reverted back to the python 3.7.3 image.

What determines the version of setuptools that pip uses. Is it bundled? If so, I can't see a reference anywhere to the v41.1 release.

dhimmel · 2019-08-20T14:08:44Z

@dquist my guess is that the different docker images have different versions of setuptools included. Unless something is pinning setuptool<41.1.0, it seems to me that the more stringent package_data checking from this PR will eventually get included in most python Docker images when they are rebuilt.

I do agree that this change is not entirely backwards compatible. Before, packages with a broken setup.py could be installed and would mostly work since package_data is often non-essential. Now it seems these package cannot be installed with pip install because an error will be thrown during checks.

I wonder if one solution would be an option like pip install --skip-setup-checks that would skip performing these checks. It would be useful for tying to install packages with broken setup.py files that despite being broken still work for certain applications. I don't think pip has this option, and I don't know much about the relationship between pip and setuptools, so not sure where this could be implemented.

pganssle reviewed May 22, 2019

View reviewed changes

dhimmel force-pushed the issue-1459 branch 2 times, most recently from f7d9412 to b8904e2 Compare May 23, 2019 17:34

dhimmel force-pushed the issue-1459 branch from 482f6c0 to c065031 Compare May 23, 2019 18:47

dhimmel changed the title ~~check_package_data: warn if value is str~~ Perform a more thorough check of package_data structure May 29, 2019

dhimmel pushed a commit to dhimmel/setuptools that referenced this pull request Jun 26, 2019

Apply patch by benoit-pierre

d6fb012

pypa#1769 (comment)

dhimmel commented Jun 26, 2019

View reviewed changes

dhimmel force-pushed the issue-1459 branch 3 times, most recently from 903cfd2 to 993bbd6 Compare June 26, 2019 22:30

dhimmel commented Jun 26, 2019

View reviewed changes

dhimmel force-pushed the issue-1459 branch from 993bbd6 to ef002e7 Compare June 26, 2019 23:39

dhimmel added a commit to dhimmel/setuptools that referenced this pull request Jul 3, 2019

Switch to explicit test for list/tuple

d9035ef

As per discussion in pypa#1769 (comment)

fix assert_string_list docstring

305bb1c

value=None raises TypeError DistutilsSetupError: 2 must be a list of strings (got None)

benoit-pierre force-pushed the issue-1459 branch from d9035ef to 8f848bd Compare July 16, 2019 11:15

benoit-pierre approved these changes Jul 16, 2019

View reviewed changes

improve package_data check

8f848bd

Ensure the dictionary values are lists/tuples of strings. Fix pypa#1459.

benoit-pierre merged commit 38b9010 into pypa:master Jul 23, 2019

SamG97 mentioned this pull request Aug 14, 2019

Fix setup script to use list for package_data nanoporetech/taiyaki#41

Merged

wheatdog mentioned this pull request Nov 6, 2019

"values of 'package_data' dict" must be a list of strings (got 'default_config.toml') infokiller/i3-workspace-groups#23

Closed

samkohn mentioned this pull request Nov 11, 2019

Bug in setup.py when using newer version of setuptools (>=41.1.0) larpix/larpix-control#194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform a more thorough check of package_data structure #1769

Perform a more thorough check of package_data structure #1769

dhimmel commented May 22, 2019 •

edited

dhimmel commented May 22, 2019

dhimmel commented May 22, 2019

pganssle May 22, 2019

pganssle commented May 22, 2019

dhimmel commented May 22, 2019

dhimmel commented May 23, 2019

pganssle commented May 23, 2019

benoit-pierre commented May 23, 2019

dhimmel commented May 23, 2019

benoit-pierre commented May 23, 2019

dhimmel commented May 24, 2019

dhimmel commented May 29, 2019

dhimmel commented Jun 24, 2019

benoit-pierre commented Jun 25, 2019 •

edited

benoit-pierre commented Jun 25, 2019

dhimmel commented Jun 26, 2019

dhimmel Jun 26, 2019

dhimmel Jun 26, 2019

benoit-pierre Jun 27, 2019

pganssle Jul 2, 2019

dhimmel Jul 3, 2019 •

edited

dhimmel Jul 3, 2019

dhimmel Jul 10, 2019

benoit-pierre commented Jul 16, 2019

dquist commented Aug 14, 2019 •

edited

pganssle commented Aug 17, 2019

dquist commented Aug 19, 2019

dhimmel commented Aug 20, 2019

	"distutils.setup_keywords": [
	"eager_resources = setuptools.dist:assert_string_list",
	"namespace_packages = setuptools.dist:check_nsp",
	"extras_require = setuptools.dist:check_extras",
	"install_requires = setuptools.dist:check_requirements",
	"tests_require = setuptools.dist:check_requirements",
	"setup_requires = setuptools.dist:check_requirements",
	"python_requires = setuptools.dist:check_specifier",
	"entry_points = setuptools.dist:check_entry_points",
	"test_suite = setuptools.dist:check_test_suite",
	"zip_safe = setuptools.dist:assert_bool",
	"package_data = setuptools.dist:check_package_data",
	"exclude_package_data = setuptools.dist:check_package_data",
	"include_package_data = setuptools.dist:assert_bool",
	"packages = setuptools.dist:check_packages",
	"dependency_links = setuptools.dist:assert_string_list",
	"test_loader = setuptools.dist:check_importable",
	"test_runner = setuptools.dist:check_importable",
	"use_2to3 = setuptools.dist:assert_bool",
	"convert_2to3_doctests = setuptools.dist:assert_string_list",
	"use_2to3_fixers = setuptools.dist:assert_string_list",
	"use_2to3_exclude_fixers = setuptools.dist:assert_string_list",
	],

	def check_nsp(dist, attr, value):
	"""Verify that namespace packages are valid"""
	ns_packages = value
	assert_string_list(dist, attr, ns_packages)

Perform a more thorough check of package_data structure #1769

Perform a more thorough check of package_data structure #1769

Conversation

dhimmel commented May 22, 2019 • edited

Summary of changes

Pull Request Checklist

dhimmel commented May 22, 2019

dhimmel commented May 22, 2019

pganssle May 22, 2019

Choose a reason for hiding this comment

pganssle commented May 22, 2019

dhimmel commented May 22, 2019

dhimmel commented May 23, 2019

pganssle commented May 23, 2019

benoit-pierre commented May 23, 2019

dhimmel commented May 23, 2019

benoit-pierre commented May 23, 2019

dhimmel commented May 24, 2019

dhimmel commented May 29, 2019

dhimmel commented Jun 24, 2019

benoit-pierre commented Jun 25, 2019 • edited

benoit-pierre commented Jun 25, 2019

dhimmel commented Jun 26, 2019

dhimmel Jun 26, 2019

Choose a reason for hiding this comment

dhimmel Jun 26, 2019

Choose a reason for hiding this comment

benoit-pierre Jun 27, 2019

Choose a reason for hiding this comment

pganssle Jul 2, 2019

Choose a reason for hiding this comment

dhimmel Jul 3, 2019 • edited

Choose a reason for hiding this comment

dhimmel Jul 3, 2019

Choose a reason for hiding this comment

dhimmel Jul 10, 2019

Choose a reason for hiding this comment

benoit-pierre commented Jul 16, 2019

dquist commented Aug 14, 2019 • edited

pganssle commented Aug 17, 2019

dquist commented Aug 19, 2019

dhimmel commented Aug 20, 2019

dhimmel commented May 22, 2019 •

edited

benoit-pierre commented Jun 25, 2019 •

edited

dhimmel Jul 3, 2019 •

edited

dquist commented Aug 14, 2019 •

edited