Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to the User Guide's Data Files page #3335

Merged
merged 19 commits into from Jun 7, 2022

Conversation

codeandfire
Copy link
Contributor

@codeandfire codeandfire commented May 24, 2022

Summary of Changes

  • All code snippets were given for setup.py. Have added corresponding snippets for setup.cfg and pyproject.toml.
  • To avoid incentivizing multiple top-level packages, have modified all the package trees and code snippets to include only a single package mypkg. Have added a separate example to illustrate the functionality of using the empty string "" / the asterisk * for capturing data files from multiple packages. Have also modified the setup.py code snippets and removed the find_packages("src") call since there is only a single package in each case (except one); have opted to explicitly name the package instead.
  • Have added a package tree example for the first package_data snippet. Have also added a package tree / code snippet example to show how package_data patterns should include subdirectories, separating it from the example showing the empty string "" / asterisk * functionality.
  • Tried to have consistent naming for all directories and data files used in the package trees and code snippets. All directories have been named mypkg and data files have been named data1.txt, data2.rst etc.
  • Have reformatted package tree examples. Reformatting has been done by replacing the only-indentation based directory structure diagram with a line-based tree layout; I think this looks neater.
  • Have added .. note:: blocks for paragraphs that would be more appropriately phrased as a Note. Other minor changes to text content have been made.

Pull Request Checklist

  - All code snippets were given for `setup.py`. Have added
    corresponding snippets for `setup.cfg` and `pyproject.toml`.
  - To avoid incentivizing multiple top-level packages, have modified
    all the package trees and code snippets to include only a single
    package `mypkg`. Have added a separate example to illustrate the
    functionality of using the empty string `""` / the asterisk `*` for
    capturing data files from multiple packages. Have also modified the
    `setup.py` code snippets and removed the `find_packages("src")`
    since there is only a single package in each case (except one); have
    opted to explicitly name the package instead.
  - Have added a package tree example for the first `package_data`
    snippet. Have also added a package tree / code snippet example to
    show how `package_data` patterns should include subdirectories,
    separating it from the example showing the empty string `""` /
    asterisk `*` functionality.
  - Tried to have consistent naming for all directories and data files
    used in the package trees and code snippets. All directories have
    been named `mypkg` and data files have been named `data1.txt`,
    `data2.rst` etc.
  - Have reformatted package tree examples. Reformatting has been done
    by replacing the only-indentation based directory structure diagram
    with a line-based tree layout; I think this looks neater.
  - Have added `.. note::` blocks for paragraphs that would be more
    appropriately phased as a Note. Other minor changes to text content
    have been made.
@codeandfire codeandfire marked this pull request as ready for review May 24, 2022 11:24
@abravalheri
Copy link
Contributor

abravalheri commented May 24, 2022

Thank you very much @codeandfire, the improvements are very appreciated 👏 🎉

I would like to suggest some changes in the light of the recent discussions #3323:

I think we should encourage people to consider mypkg/data folder in one of your examples as the mypkg.data namespace package instead of a "data directory" (Python considers that anyway)... I might be able to submit a PR with suggestions for you later this week.

@codeandfire
Copy link
Contributor Author

codeandfire commented May 24, 2022

@abravalheri Sure, sounds interesting, will look forward to this PR since I'm considering implementing a similar layout in one of my own projects!

May I ask, though, what ramifications this will have for the advice given here? Especially since the linked importlib docs page says that namespace packages will be ignored.

EDIT: I think this has been tried in the discussion #3323 that you have linked. In one of the comments in the thread, one of the users has tried using importlib.resources.

@abravalheri
Copy link
Contributor

abravalheri commented May 24, 2022

May I ask, though, what ramifications this will have for the advice given here? Especially since the linked importlib docs page says that namespace packages will be ignored.

I did a quick test:

rm -rf /tmp/myproj
mkdir /tmp/myproj
cd /tmp/myproj
cat <<EOS > pyproject.toml
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools", "setuptools_scm[toml]"]

[project]
name = "myproj"
dynamic = ["version"]

[tool.setuptools_scm]
EOS

mkdir -p src/myproj/data
touch src/myproj/__init__.py
echo "hello world" > src/myproj/data/data1.txt
echo 42 > src/myproj/data/data2.json
tree
# .
# ├── pyproject.toml
# └── src
#     └── myproj
#         ├── __init__.py
#         └── data
#             ├── data1.txt
#             └── data2.json

git init .
git add .
git commit -m "Initial commit"

virtualenv .venv
.venv/bin/python -m pip install -U pip importlib-resources
.venv/bin/python -m pip install .

ls .venv/lib/python*/site-packages/myproj/data  # making sure data files are installed
.venv/bin/python
>>> import importlib_resources, myproj.data
>>> importlib_resources.files(myproj.data).joinpath("data1.txt").read_text()
'hello world\n'

Here I used pyproject.toml configuration which is equivalent to include_package_data=True and packages = find_namespace:, etc... I also used setuptools-scm because I cannot be bothered to create a MANIFEST.in file...

importlib_resources seems to be fine with the approach...

I also tried:

>>> import importlib_resources, myproj
>>> importlib_resources.files(myproj).joinpath("data/data1.txt").read_text()
'hello world\n'

Maybe importlib_resources docs are outdated? Or maybe it does not support a "pure namespace-only" package?

@abravalheri
Copy link
Contributor

abravalheri commented May 24, 2022

I also did a second example:

rm -rf /tmp/myproj
mkdir /tmp/myproj
cd /tmp/myproj
cat <<EOS > pyproject.toml
[build-system]
build-backend = "setuptools.build_meta"
requires = ["setuptools", "setuptools_scm[toml]"]

[project]
name = "myproj"
dynamic = ["version"]

[tool.setuptools_scm]
EOS

mkdir -p src/myproj/data
echo "hello world" > src/myproj/data/data1.txt
echo 42 > src/myproj/data/data2.json
tree
# .
# ├── pyproject.toml
# └── src
#     └── myproj
#         └── data
#             ├── data1.txt
#             └── data2.json

git init .
git add .
git commit -m "Initial commit"

virtualenv .venv
.venv/bin/python -m pip install -U pip importlib-resources
.venv/bin/python -m pip install .

>>> import importlib_resources, myproj.data
>>> importlib_resources.files(myproj).joinpath("data/data1.txt").read_text()
'hello world\n'

Everything still seems to work fine even with a namespace only package.

@abravalheri
Copy link
Contributor

abravalheri commented May 24, 2022

The support for namespace packages seems to have been added to recent versions of importlib-resources (October last year): python/importlib_resources#196.

@codeandfire
Copy link
Contributor Author

You're right, it's the docs that are outdated. I have raised a PR with them python/importlib_resources#251.

Have tried to make the working of the `include_package_data` option as
clear as possible.
 - Added a package tree
 - Tried to clearly state that the data files must be either included in
   `MANIFEST.in`, or tracked by a VCS, in order for them to be included
   in the installation of the package, when `include_package_data=True`.
 - Added a `MANIFEST.in` snippet to make things more clear.
@codeandfire
Copy link
Contributor Author

I think we should encourage people to consider mypkg/data folder in one of your examples as the mypkg.data namespace package instead of a "data directory"

@abravalheri from this, should I take it to mean that we are encouraging a scheme in which we set include_package_data=True and only specify files to exclude using the exclude_package_data argument, over a scheme in which we explicitly specify all the data files using the package_data argument?

@abravalheri
Copy link
Contributor

My original idea is not exactly about using include_package_data instead of package_data.
Both are fine.

It is more about not writing:

package_data={"mypkg": ["*.txt", "data/*.rst"]}

but instead:

package_data={
    "mypkg": ["*.txt"],
    "mypkg.data": ["*.rst"],
}

@codeandfire
Copy link
Contributor Author

I see. I'll go ahead and amend my PR then, to include these changes, and maybe then you can review them and let me know of any further changes.

Removed the statement within the parentheses, since the example which
follows does not illustrate this specific example (of having
documentation files that you may not want to include in the
installation). Besides the `exclude_package_data` option covers this
exact use case in a later example.
Modified code snippets for `package_data` example with `data`
subdirectory to treat the `data` subdirectory as a namespace package.
Also modified a paragraph below these snippets.
Made them consistent with the snippets given on the Package Discovery
page.
 - Instead of enumerating a list of all the packages in `packages`,
   using `find_packages` or `find:` instead. The `find_packages` call in
   `setup.py` contains a `where` argument. In `setup.cfg`, included the
   section `options.packages.find` with a `where` option.
 - Instead of supplying the same `package_dir` for each package, using
   an empty string to indicate a `package_dir` for all packages.
 - In `pyproject.toml`, using the `where` option instead of
   `package-dir`.
 - Textual changes.
Tried to make why this option is useful more clear.
Made them consistent with the snippets given on the Package Discovery
page. The changes made here are similar to the changes made to the
previous example.
In the end of the document, in the summary section, there is a line
stating that the files matched by `package_data` do not require a
corresponding `MANIFEST.in` or a revision control system plugin. Have
included this note higher up in the document because I felt it may be of
interest to users and they might miss this line so far down the
document.
  - Added example package tree
  - Added snippet on how typically the `__file__` attribute would be
    used
  - Added snippet showing usage of `importlib.resources` with the
    `files()` API
  - Added notes on compatibility of this code with different Python
    versions along with references
  - Added snippet to show usage of `importlib_resources` backport
I believe this footnote is outdated and not required in lieu of the
added notes describing compatibility with different Python versions
This footnote describes what Setuptools considers as a data file. This
note is important and may be missed by the reader if it is kept as a
footnote, hence I have copied its contents up ahead in the document,
just after the `include_package_data` example.
  - Added `include_package_data`, `package_data` and
    `exclude_package_data` sections to make clear the three options
    provided by Setuptools to manage data files.
  - Added a separate section illustrating the use of a `data`
    subdirectory, after these three sections.
  - Placed the summary of the three options under a Summary section.
  - Changed the levels of the last two sections to match the level of
    the five sections added.
  - Small changes. Changed the wording where appropriate to suit the new
    flow. Changed a paragraph on path separators in glob patterns to a
    Note.
Just to make it clear that we can use either one of `package_data` or
`include_package_data` and not just the former.
@codeandfire
Copy link
Contributor Author

@abravalheri sorry for the delay, but have amended my PR to incorporate the changes suggested by you along with a bunch of other changes. Let me know your feedback. Thanks!

Copy link
Contributor

@abravalheri abravalheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @codeandfire, those are really very good improvements.
(Sorry for the delay in providing a review/suggestions)

I added some small comments, but most of them refer mainly to existing content that was already present in the previous version. I am very happy to accept this PR as it is and do small adjustments later if necessary.

docs/userguide/datafiles.rst Show resolved Hide resolved
docs/userguide/datafiles.rst Outdated Show resolved Hide resolved
docs/userguide/datafiles.rst Outdated Show resolved Hide resolved
docs/userguide/datafiles.rst Show resolved Hide resolved
docs/userguide/datafiles.rst Outdated Show resolved Hide resolved
docs/userguide/datafiles.rst Outdated Show resolved Hide resolved
docs/userguide/datafiles.rst Show resolved Hide resolved
docs/userguide/datafiles.rst Outdated Show resolved Hide resolved
docs/userguide/datafiles.rst Show resolved Hide resolved
@abravalheri
Copy link
Contributor

Thank you very much @codeandfire, I think I will go ahead and merge the PR, unless you would like to work in other changes.

@codeandfire
Copy link
Contributor Author

@abravalheri no other changes from my side, you can go ahead and merge. Thanks again for your review!

@abravalheri abravalheri merged commit b73b480 into pypa:main Jun 7, 2022
@codeandfire codeandfire deleted the data-files-fix branch June 8, 2022 12:12
@codeandfire codeandfire restored the data-files-fix branch June 8, 2022 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants