Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package_data not doing recursive glob calls #1806

Closed
mvaal opened this issue Jul 24, 2019 · 13 comments · Fixed by #3309
Closed

package_data not doing recursive glob calls #1806

mvaal opened this issue Jul 24, 2019 · 13 comments · Fixed by #3309

Comments

@mvaal
Copy link

mvaal commented Jul 24, 2019

Even though glob supports it, glob support for recursive ** searching is not enabled when specifying package_data. This is not clear in the documentation.

globs_expanded = map(glob, patterns)

This means that only single * searches in a specific directory are supported. By changing the glob function to support recursive, this would allow us to specify ** patterns to a directory instead of having to specify every sub-directory in the structure.

Reference:
https://github.com/python/cpython/blob/master/Lib/glob.py#L18

@mvaal mvaal changed the title package_data not respecting all glob formats package_data not doing recursive glob calls Jul 24, 2019
@jaraco
Copy link
Member

jaraco commented Oct 4, 2019

Yes, I can confirm that without calling glob(pattern, recursive=True), the ** pattern isn't expanded.

What this issue needs is more background. What is a use-case where having recursive globbing support would be helpful?

More importantly, this routine is for find_data_files in a package. Since a Python package is already defined as a directory containing files, it may not even be viable for a package to contain directories of data. Even if you're able to cause subdirectories and their contents to be added as package data, I'm not sure that would be a supported scenario. Before adding facilities to allow a packager to more easily add this data to a package, we should be sure that's something that the packaging systems endorses/supports.

@mvaal
Copy link
Author

mvaal commented Oct 4, 2019

Thanks for getting back. I have a nested directory of configuration files (.yml) and a nested directory of validation files (.json). Unfortunately, I have to go through each sub directory in order to include them in the package. This works and is working for my current use case, but every time I add addition sub-directories, I have to remember to add another path.

[options.package_data]
module = config/A/1/*.yml, config/B/1/*.yml, validation/A/*.json, validation/B/*.json

I would prefer to just have to do this (in fact I originally tried this, but didn't work which is why I reported the issue).

[options.package_data]
module = config/**/*.yml, validation/**/*.json

I also tried *.yml, *.json that the docs alluded to working, but it does not (same issue non recursive so only works on root directory files).

[options.package_data]
# If any package contains *.txt or *.rst files, include them:
* = *.txt, *.rst
hello = *.msg

@JaapJoris
Copy link

Agreed, it would be massively helpful if I could use the ** glob in setup.py's package_data. I deploy Django apps which include many templates in potentially endless subdirectories. There is no glob pattern except **/*.html to describe their paths.

@thenewguy
Copy link

I just found this issue working with Django templates as well. Recursive glob support would be tremendously helpful when packaging Django apps that include templates

@psam44
Copy link

psam44 commented Aug 21, 2020

I got the same problem with stub files for typing. These data files are *.pyi and may be spread at multiple folder levels.

I have two workarounds to this difficulty.

One is with a custom implementation:

import os
from typing import List

def find_stubs(top: str) -> List[str]:
    stubs = []
    for root, _, files in os.walk(top):
        dir = root.split(os.path.sep, 1)[-1] # rid of 'top'
        for f in files:
            if f.endswith('.pyi'):
                if os.path.sep in root:
                    f = os.path.join(dir, f)
                stubs.append(f)
    return stubs

setup(
    …
    package_data={'foo': find_stubs('foo')},
    …
)

Another one is more generic, with a manifest template:

setup(
    …
    include_package_data=True,
    …
)

MANIFEST.in

recursive-include foo *.pyi

Working with Django, I prefer the second way because I also need to include many other data files:

recursive-include foo/locale *.po *.mo
recursive-include foo/static *
recursive-include foo/templates *.html *.txt
graft docs
prune docs/_build

@cvanelteren
Copy link

cvanelteren commented Oct 21, 2020

I would like to add the issue that for my cython code setuptools does not copy the data files to site-packages.

Sources list my pxd files but they are not transferred to the install directory. I have both tried using MANIFEST.in as well as package_data = {"": "**pxd **pyx".split()}. A workaround form was to use data_files which is not preferred ...

ap-- added a commit to bayer-science-for-a-better-life/HistoQC that referenced this issue Mar 28, 2021
It would be possible to switch to setup.cfg declarative config style,
if we'd specify the files individually, or add subfolder matching via
multiple single star globs, i.e.:
    *.css,
    */*.css,
    */*/*.css,
but unless all files are specified or
pypa/setuptools#1806 is implemented it
would probably cause confusion for a histoqc developer when they're
modifying the ui.
@brianthelion
Copy link

brianthelion commented Sep 29, 2021

Bump.

At a glance, the situation looks worse than described when there are multiple packages in a single source tree.

Unfortunately I can't quite get resolution on what's happening when I run bdist_wheel. Is there an environment variable or something that I can set to see how the package_data is being assembled?

@jaraco
Copy link
Member

jaraco commented Oct 3, 2021

It sounds as if there's a reasonable need for this issue, and now that I see that using include_package_data would effectively include the same contents as a recursive glob, I'm a lot less concerned about creating a potential new issue. I'd entertain a PR that addresses this issue.

Is there an environment variable or something that I can set to see how the package_data is being assembled?

I'm not aware of one. I guess you'll have to go to the source.

@ilCatania
Copy link

As a workaround while this is implemented, I've done the following, it might be useful to someone else (note the example uses src layout but is easy to tweak):

import os
from pathlib import Path
from typing import Dict, List

def _expand_recursive_globs(package_data: Dict[str, List[str]]) -> Dict[str, List[str]]:
    root = (Path(__file__).parent / "src").resolve()
    for module, patterns in package_data.items():
        new_patterns = []
        m_root = None
        for idx, p in enumerate(patterns):
            if "**" in p:
                m_root = m_root or root / module.replace(".", os.sep)
                for f in m_root.glob("**"):  # all subdirectories
                    if f.name == "__pycache__":
                        continue
                    subdir_pattern = p.replace("**", str(f.relative_to(m_root)))
                    new_patterns.append(subdir_pattern)
            else:
                new_patterns.append(p)
        package_data[module] = new_patterns
    return package_data

setup(
    name="my-package",
    packages=find_packages("src"),
    package_dir={"": "src"},
    package_data=_expand_recursive_globs({"my_module": ["**/*.csv"]})
)

and if you have this project structure:

setup.py
src/
    my_module/
        __init__.py
        a/
            file1.csv
        b/
            file2.csv

then the package data will expand to:

package_data={"my_module": ["./a/*.csv", "./b/*.csv"]})

@mat-rs
Copy link

mat-rs commented Mar 31, 2022

Bump.
I came here googling the same issue. Are there any news on implementing the recursive search?
I need this in a setup.cfg/pyproject.toml flow, so I can't work around by writing my own function in setup.py :-(

@abravalheri
Copy link
Contributor

Hi @mat-rs thanks for the ping. If you would like to give it a try, we would love receiving a PR!

Regarding the setup.py/setup.cfg, I believe that in the end both configurations are merged into one. So you can keep most of your config in setup.cfg and just have the package_data in setup.py.

Alternatively, you can also consider include_package_data=True.

@mat-rs
Copy link

mat-rs commented Mar 31, 2022

Actually I didn't want to add setup.py to my projects and I haven't yet found out what exactly include_package_data does, but as per the discussion above, I assumed that implementing the recursive search for package_data was agreed upon to make sense.

Would it be as simple as replacing:

globs_expanded = map(glob, patterns)

with

from functools import partial
...
globs_expanded = map(partial(glob, recursive=True), patterns)

Or would that break anything of the current expected behavior?

@abravalheri
Copy link
Contributor

Thank you very much @mat-rs for having a look on that!

Would it be as simple as replacing

It seems like a good shot, by looking at

globs_expanded = map(glob, patterns)
# flatten the expanded globs into an iterable of matches
globs_matches = itertools.chain.from_iterable(globs_expanded)
glob_files = filter(os.path.isfile, globs_matches)

Or would that break anything of the current expected behavior?

You can start by running the test suite (e.g. via tox -- -p no:cov to filter out the noise) and see what happens. It would be also nice if you add some tests to capture the new behaviour/feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants