Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Improve mentions to MANIFEST.in and instructions for including data files #3148

Merged
merged 14 commits into from Mar 5, 2022

Conversation

abravalheri
Copy link
Contributor

@abravalheri abravalheri commented Mar 3, 2022

Motivation

Using MANIFEST.in and adding non-Python files seems to be one of the most difficult aspects that beginners face when getting started with setuptools, as evidenced by a recent comment in pypa/packaging-problems#84 (comment).

The current docs also lack an explanation for the syntax of MANIFEST.in that is not bundled together with the deprecated/distutils documentation.

Summary of changes

  • Add links to the PyPUG instructions about using MANIFEST.in to existing docs,
  • Mention more prominently VCS plugins as a viable alternative to MANIFEST.in
  • Add a section to miscellaneous.rst about how to control files in the distribution that also links to the PyPUG document about MANIFEST.in.
    • This new section also emphasize the usage of VCS plugins before mentioning MANIFEST.in.
  • Emphasize more prominently that package data files are the files inside the package directory (people seem to still like to organise their files in a way that the non-Python files are external to the package dir, but expect them to be distributed...)
  • Remove mention to pkg_resources being the preferred way of accessing data files, in favour of importlib.resources.
  • Added a note recommending data files to be read-only.

Pull Request Checklist

If using the ``include_package_data`` argument, files specified by
``package_data`` will *not* be automatically added to the manifest unless
they are listed in the |MANIFEST.in|_ file or by a plugin like
:pypi:`setuptools-scm` or :pypi:`setuptools-svn`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this affirmation still holds (after #2844), I have the impression that sometimes package_data is enough and does not require MANIFEST.in...

This does seem to be the case in the experiment performed in: https://github.com/abravalheri/experiment-setuptools-package-data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU for that repo with the experiments, it is super helpful!

@abravalheri abravalheri marked this pull request as ready for review March 3, 2022 18:58
docs/userguide/datafiles.rst Outdated Show resolved Hide resolved
@@ -18,9 +18,10 @@ e.g.::
)

This tells setuptools to install any data files it finds in your packages.
The data files must be specified via the distutils' ``MANIFEST.in`` file.
The data files must be specified via the distutils' |MANIFEST.in|_ file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to call this is a distutils file? I think it is confusing to mention distutils at all since it is being largely hidden in setuptools.

configuration and all C sources listed as part of extensions
(it doesn't catch C headers, though).

However, when building more complex packages (e.g. packages that include
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth calling out that wheels (and the final installed package) contain only the data files that are contained within listed package directories, and only when include_package_data is True. Specifically this can be used to include a top-level tests folder in an sdist using MANIFEST.in. But by not including tests in the packages metadata (by using explicit packages or include/exclude with find_packages, you avoid shipping the tests in the wheel and polluting site-packages with a tests folder. The tests in the sdist can be used by downstream packagers like Linux distros or conda-forge to test the package before re-packaging. Maybe this should go in the PyPUG itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth calling out that wheels (and the final installed package) contain only the data files that are contained within listed package directories, and only when include_package_data is True

Ummm... in theory wheels could contain files outside the package directories via the .data subdir in the wheel specification. That however is super tricky, because it will depend on the installation schema that varies with OS/platform/virtualenv etc...
So we want to move users away from that.

Maybe this should go in the PyPUG itself?

It is definitely a good idea to add a discussion about tests/docs in the sdist to PyPUB! Maybe at some point in the future I will give it a try and propose a PR there (but please also feel free to go ahead and open one if you have the time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @blink1073, in 8252975 I tried to make these relationships more clear. Would you mind to have a look and see if the text is more clear/has improved?

There is also a link to the rendered version here: https://setuptools--3148.org.readthedocs.build/en/3148/userguide/miscellaneous.html#controlling-files-in-the-distribution

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Copy link
Contributor Author

@abravalheri abravalheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the kind review @blink1073!

I will try to think about something for the sdist x wheel behaviour.

docs/userguide/miscellaneous.rst Outdated Show resolved Hide resolved
Please note that, when using ``include_package_data=True``, only files **inside
the package directory** are included in the final ``wheel``, by default.

So for example, if you create a :term:`Python project <Project>` that uses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding something like: "If you are using MANIFEST.in, and you include a tests directory that is not included in the package director(ies), the tests will also be present in the sdist but not in the wheel. If using find_packages, ensure that your glob pattern does not include the tests directory, or you use explicit include or exclude config along with find_packages to ensure that the tests folder is not included. Note: if you accidentally include a top level tests folder (or another top level folder like docs) in your packages config, it will end up in site-packages and get merged with any other package that accidentally does the same."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm starting to think that all of this content should be in PyPUG and we should reference it from here so there is a single source of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @blink1073 for the text review.

I understand your concerns here, but I am starting to think that this is a non-trivial amount of information to be casually dropped in the middle of this section.

Maybe it would be best would be to add a separated section/page including the "most common configuration errors"?

That could live in the PyPUG, or even in setuptools docs if the maintainers find that too tool-specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am already concerned about having 2 footnotes here... too much branching in the information flow)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about being tool-specific. It seems like anything that isn't related to a PEP should go in these docs (same with flit-specific options for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But perhaps that needs to wait until setuptools handles PEP 621 and PEP 660 so there is a uniform set of docs that can go in PyPUG and then refer out to tool-specific docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will stop the changes in this PR at this stage. Do you think it is in a good shape (although incomplete, I think it is an improvement over the current status of the docs...)

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Copy link
Contributor

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@abravalheri abravalheri merged commit e8ad85b into pypa:main Mar 5, 2022
@abravalheri abravalheri deleted the manifest-links branch March 5, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants