Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Dotfiles are not included as packaga_data unless explicitly listed #3350

Closed
comabrewer opened this issue Jun 8, 2022 · 8 comments · Fixed by #3351
Closed

[BUG] Dotfiles are not included as packaga_data unless explicitly listed #3350

comabrewer opened this issue Jun 8, 2022 · 8 comments · Fixed by #3351
Labels
bug Needs Triage Issues that need to be evaluated for severity and status.

Comments

@comabrewer
Copy link
Contributor

comabrewer commented Jun 8, 2022

setuptools version

setuptools==62.3.3

Python version

Python 3.9

OS

Windows

Additional environment information

No response

Description

Dotfiles (files with a leading . like .gitignore) are not packaged as datafiles unless they are explicitly listed.

So this all works for all non-dotfiles:

[options.package_data]
* = data/**

Or explicitly listing all levels (single-star):

[options.package_data]
* = 
  data/*
  data/*/*

But dotfiles have to be added explicitly:

[options.package_data]
* = 
  data/**
  data/.data.txt

Or with a special pattern with leading dot in the filename:

[options.package_data]
* = 
  data/**
  data/**/.*

Expected behavior

If I specify patterns for package_data (either via single-star or the new double-star glob patterns), I would expect all files to be included.

How to Reproduce

I have the following project structure:

example/
  src/
    example/
      data/
        data.txt
        .data.txt
      __init__.py
  pyproject.toml
  setup.cfg

With the following setup.cfg:

[metadata]
name = example
version = 0.0.0

[options]
package_dir =
    = src
packages = find:

[options.packages.find]
where = src

[options.package_data]
* = data/**

Inside a virtual environment, I install the package and list the contents:

rm -r build/
pip install .
ls -al .venv/Lib/site-packages/example/data

Output

Only data.txt is contained, but not .data.txt:

data.txt
@comabrewer comabrewer added bug Needs Triage Issues that need to be evaluated for severity and status. labels Jun 8, 2022
@comabrewer
Copy link
Contributor Author

comabrewer commented Jun 8, 2022

epic facepalm.. ls is not the way to look for hidden files. Sorry for the inconvenience.

Seems this was not the issue (because I alias ls=ls -al).
I could not reproduce the issue because there were orphaned data files in the build/ directory. After deleting the build/ directory, the original issue still holds.

@comabrewer comabrewer reopened this Jun 8, 2022
@abravalheri
Copy link
Contributor

Hi @comabrewer, I recommend changing the patterns1, and using find_namespace:2

# ...
-packages = find:
+packages = find_namespace:
# ...
[options.package_data]
-* = data/**
+* = data/**, data/.*

I did the following:

rm -rf /tmp/example
mkdir -p /tmp/example/src/example/data
touch /tmp/example/src/example/__init__.py
touch /tmp/example/src/example/data/data.txt
touch /tmp/example/src/example/data/.data.txt
cat <<EOS > /tmp/example/setup.cfg
[metadata]
name = example
version = 0.0.0

[options]
package_dir =
    = src
packages = find_namespace:

[options.packages.find]
where = src

[options.package_data]
* = data/**, data/.*
EOS
cat <<EOS > /tmp/example/pyproject.toml
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools"]
EOS
cd /tmp/example
virtualenv -p python3.9 .venv
.venv/bin/python -m pip install -U pip build
.venv/bin/python -m build
unzip -l dist/*.whl
Archive:  dist/example-0.0.0-py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2022-06-08 19:50   example/__init__.py
        0  2022-06-08 19:50   example/data/.data.txt
        0  2022-06-08 19:50   example/data/data.txt
       52  2022-06-08 19:51   example-0.0.0.dist-info/METADATA
       92  2022-06-08 19:51   example-0.0.0.dist-info/WHEEL
        8  2022-06-08 19:51   example-0.0.0.dist-info/top_level.txt
      519  2022-06-08 19:51   example-0.0.0.dist-info/RECORD
---------                     -------
      671                     7 files

and the .data.txt file seems to be packaged correctly.

Footnotes

  1. I imagine that ** does not match files starting with .

  2. The example.data package is a namespace package after all

@comabrewer
Copy link
Contributor Author

comabrewer commented Jun 8, 2022

Apparently setuptools uses glob.glob from the standard library. The observed behavior stems directly from glob and can be reproduced with import glob; for p in glob.glob("*"): print(p).

The Python documentation states that:

If the directory contains files starting with . they won’t be matched by default.

So if the behavior is only a surprise to me, please feel free to immediately close the issue.

Otherwise it might make sense to document the behavior in the setuptools docs on data files? The section on exclude lists talks about excluding dotfiles, this might have lead to my impression that they were captured by patterns automatically, even though the section clearly talks about VCS.

@abravalheri
Copy link
Contributor

abravalheri commented Jun 8, 2022

@comabrewer, this behaviour is not exclusive to Python. In general glob patterns have this behaviour everywhere. In your example you can try:

$ ls -a src/example/data/*
src/example/data/data.txt
$ ls -a src/example/data/.*
src/example/data/.data.txt

So I think for the time being we can close this issue.

Would you like to submit a PR with the suggested changes for the docs? That would be very useful.

@comabrewer
Copy link
Contributor Author

@abravalheri thank you very much for your responses (which I saw just after my previous comment).
From my understanding I would also have expected find_namespace to be necessary, but it also seems to work correctly with find:. Anyway I will adapt the corresponding line.

Regarding the glob pattern I totally agree that it was because of my lack of understanding, sorry for that.
I would be glad to clarify this in the docs, if you feel that it adds value. Should I also mention the newly added support for recursive patterns, which isn't mentioned yet?

In any case many thanks for your great work on improving setuptools! It has become a real pleasure to use, with many long desired features being added lately.

@abravalheri
Copy link
Contributor

abravalheri commented Jun 8, 2022

Thank you very much for the kind words @comabrewer.

From my understanding I would also have expected find_namespace to be necessary, but it also seems to work correctly with find:. Anyway I will adapt the corresponding line.

It will work for this particular example, but it is a bit clumsy (Python does not have the concept of a "data" directory... all directories are considered packages, so I am trying to get people to embrace that mindset).

I have mixed feelings about the ** pattern to be sincere. It might be useful, but there is a change it can cause inconsistencies in the case of [options.packages.find] exclude (maybe we should add a note about that?).

@comabrewer
Copy link
Contributor Author

As an additional remark, pathlib works differently, at least on Windows:

from pathlib import Path

for p in Path.home().glob("*"): 
   print(p)

Also shows files starting in .

This behavior comes from fnmatch:

from fnmatch import fnmatch
assert fnmatch(".gitignore", "*") is True

@abravalheri
Copy link
Contributor

abravalheri commented Jun 8, 2022

Ouch, I was not expecting this difference...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants