Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include *.md files inside mkdocs/tests/integration in the tarball #1494

Closed
wants to merge 1 commit into from

Conversation

mitya57
Copy link
Contributor

@mitya57 mitya57 commented May 23, 2018

Currently, the tarball contains only a part of the files needed for the tests:

$ wget -q "https://files.pythonhosted.org/packages/8a/cc/593faba2554b0a950249b0240417319de67f3e2ee5b70755c49b70be043a/mkdocs-0.17.3.tar.gz"
$ tar xzf mkdocs-0.17.3.tar.gz
$ cd mkdocs-0.17.3/
$ ls -l mkdocs/tests/integration
total 8
drwxr-xr-x 4 dmitry dmitry 4096 Mar  7 19:55 complicated_config
drwxr-xr-x 3 dmitry dmitry 4096 Mar  7 19:55 subpages

While in the Git checkout there are four directories instead of two:

$ ls -l mkdocs/tests/integration
total 16
drwxr-xr-x 4 dmitry dmitry 4096 May 20 13:06 complicated_config
drwxr-xr-x 3 dmitry dmitry 4096 May 20 13:06 minimal
drwxr-xr-x 3 dmitry dmitry 4096 May 20 13:06 subpages
drwxr-xr-x 3 dmitry dmitry 4096 May 20 13:06 unicode

This merge request makes sure that for the next release the tarball will contain all the needed files.

Without these files, the tests are incomplete.
@mitya57 mitya57 changed the title Include *.md files inside mkdocs/tests/integration into the tarball Include *.md files inside mkdocs/tests/integration in the tarball May 23, 2018
@waylan
Copy link
Member

waylan commented May 23, 2018

Thanks. I wonder if we should be running the tests against the tarball (or a wheel) to make sure this sort of thing is caught. And if so, what the best way to do that is.

We shouldn't need to do that for every supported Python version as our existing tests cover those issues. But perhaps we should add one additional test run to Travis CI which uses a distribution package rather than a git clone.

@mitya57
Copy link
Contributor Author

mitya57 commented May 23, 2018

I don’t think it makes sense to test the already released tarballs on Travis, but building the (fresh) tarball and running tests against it does make sense.

I see the tests are failing on Windows. That is probably because of ♪.md file. Do you have any idea how to fix that?

@waylan
Copy link
Member

waylan commented May 23, 2018

building the (fresh) tarball and running tests against it does make sense.

Yes, that's what I had in mind. Although, thinking about it more, I don't think that is necessary. Tox already installs a build in a virtual env for each env in the matrix. And a build would contain everything that the tarball would contain. However, I wonder if perhaps some of the tests are run using the git working tree rather than the files installed in the env. Perhaps the correct approach is to ensure tox is using the "installed" files rather than the git working tree.

In fact, I expect the new error on Windows is related directly to this. Previously we didn't see any error because the related files were simply being excluded from the builds. And all the tests passed because those files still existed in the working tree.

And to be clear, all references to "build" above refer to "building" the MkDocs Python package, not using MkDocs to "build" documentation.

Which brings me to the reason for the failures on Windows. I was going to suggest that the problem is related to #1426, but that is addressing an issue with MkDocs reading documentation files, not with Python installing them as part of a Python package. However, the error occurs when the package is attempted to be installed. Note that the package does build successfully (a wheel file is successfully created) . In other words, there is no problem having the files with Unicode filenames included in the wheel, but there is a problem when those files are read from the wheel in an attempt to copy them to the install directory.

@waylan
Copy link
Member

waylan commented May 23, 2018

As an aside, I wonder if our tests haven't encountered the reported problem in #1426 because we aren't actually running our tests against non-ASCII filenames as those files are getting left out of the tests due to the problem being addressed in this issue.

@waylan
Copy link
Member

waylan commented May 23, 2018

Hmm, I can't seem to reproduce this, but not for the reasons you would expect. I have the following files structure as a minimum test case:

- foo/
    - __init__.py
    - ascii.txt
    - ♪.txt
- MANIFEST.in
- setup.py

MANIFEST.in contains:

recursive-include foo *.txt

And setup.py includes:

from setuptools import setup, find_packages

setup(
    name="foo",
    packages=find_packages(),
    include_package_data=True,
)

When I run python setup.py build the built package contains foo/__init__.py and foo/ascii.txt but not foo/♪.txt. I get the same results when building a distribution (wheel, zip, etc). In other words, the file with the non-ASCII filename never gets included in a build/distribution and therefore can never raise an error on installation.

@waylan
Copy link
Member

waylan commented May 23, 2018

I can now replicate the error. I removed the MANIFEST.in file and defined the package_data directly in the setup.py script:

# coding=utf-8

from setuptools import setup

setup(
    name=u"foo",
    packages=['foo'],
    package_dir={'foo': 'foo'},
    package_data={'foo': ['ascii.txt', u'♪.txt']}
)

And when I try to build I get the following error on Windows with Python 2.7:

$ python setup.py build
running build
running build_py
Traceback (most recent call last):
  File "setup.py", line 9, in <module>
    package_data={'foo': ['ascii.txt', u'ΓÖ¬.txt']}
  File "C:\Python27\lib\site-packages\setuptools\__init__.py", line 129, in setu
p
    return distutils.core.setup(**attrs)
  File "C:\Python27\lib\distutils\core.py", line 151, in setup
    dist.run_commands()
  File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
    cmd_obj.run()
  File "C:\Python27\lib\distutils\command\build.py", line 127, in run
    self.run_command(cmd_name)
  File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
    cmd_obj.run()
  File "C:\Python27\lib\site-packages\setuptools\command\build_py.py", line 53,
in run
    self.build_package_data()
  File "C:\Python27\lib\site-packages\setuptools\command\build_py.py", line 123,
 in build_package_data
    outf, copied = self.copy_file(srcfile, target)
  File "C:\Python27\lib\distutils\cmd.py", line 365, in copy_file
    dry_run=self.dry_run)
  File "C:\Python27\lib\distutils\file_util.py", line 130, in copy_file
    log.info("%s %s -> %s", action, src, dir)
  File "C:\Python27\lib\distutils\log.py", line 40, in info
    self._log(INFO, msg, args)
  File "C:\Python27\lib\distutils\log.py", line 30, in _log
    stream.write('%s\n' % msg)
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u266a' in position
 12: character maps to <undefined>

If I remove ♪.txt from the list in package_data the build runs fine except that the file is not included.

I should note that this error occurs when the file is explicitly included. However whenever using a wildcard, the file is simply ignored in Python 2.7. Notice that the appveyor tests in this PR are passing for Python 2.7 but are failing for Python 3. I believe that that is because the Unicode filenames are simply not being included in the build in Python 2.7, but they are in Python 3. But, it would appear that Unicode filenames aren't actually supported and therefore are raising an error.

As support of that conclusion, in my searching I found this Distribute issue which suggests that it is possible to include files with non-ASCII filenames in a Python 3 package. The issue also points to a few Python Issues which support that that is possible as well. The problem reported in those issues seems to be that once the files are included, they raise errors when trying to get them back out, which matches my analysis of the appveyor error logs for this PR.

So, in the end it seems that we cannot include non-ASCII filenames in a Python 2.7 package, and, while we can include them in a python 3 package, we may have problems getting them back out. As these files are just test fixtures, perhaps we should be exploring other ways to test for the relevant issues covered by those fixtures.

I should also note that I found mgedmin/check-manifest, which is a lib for confirming that everything is properly included in the MANIFEST. Perhaps we should be using that regardless of how we resolve this issue of how to include test fixtures.

@waylan
Copy link
Member

waylan commented May 23, 2018

On possible solution might be to move the test fixtures (integration tests) out of the mkdocs.tests lib. The problem is that some of the unittests also rely on the same test fixtures as it was easier to use existing files than set up and tear down temp dirs/files. Those unittests would need to be refactored.

@waylan
Copy link
Member

waylan commented May 23, 2018

I've actually been thinking about reworking some of tests which use temp dirs. Some helpers I've found include testfixtures, fixtures, tempcase, and of course, pytest. Which, if any of those we might want to use I don't know.

@mitya57
Copy link
Contributor Author

mitya57 commented May 25, 2018

Interesting that it fails in different code paths on different Python versions:

  • With Python 3.4, it fails in pip\_internal\wheel.py, line 502, in move_wheel_files;
  • With Python 3.5 and 3.6, it fails in distutils\log.py, line 34, in _log.

Also interesting that in Python 3.6, Windows filesystem encoding (PEP 529) and console encoding (PEP 528) changed to UTF-8, but that did not affect the failure in any way.

@waylan
Copy link
Member

waylan commented May 25, 2018

Interesting indeed. Note that PEP 529 (Change Windows filesystem encoding to UTF-8) states in the Abstract that (emphasis added):

This only affects the encoding used when users pass a bytes object to Python where it is then passed to the operating system as a path name.

However, the Python 3 Documentation on Unicode Filenames explains that:

The os.listdir() function returns filenames and raises an issue: should it return the Unicode version of filenames, or should it return bytes containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as bytes or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing a byte path will return the filenames as bytes.

My takeaway from this all is that if libs only every pass Unicode strings as paths to path handling functions, then the right thing happens every time. But once bytes get mixed in with those strings, things start to break down. Of course, as ASCII bytes are a subset of various encodings (including UTF-8) and most tests are done with ASCII filenames, these problems don't get caught in testing.

I'm not surprised the errors occur in different places in different versions. Presumably different libs get the reports and fix their independent issues at different stages along the way. For example, an earlier version of lib1 has a bug which was fixed in a later release. But that fix exposes a bug in lib2 that we never got to with the lib1 bug. Of course, each lib is maintained by a different team, so there are gaps between when the fixes become available.

The problem for us is that these bugs still exist out in the real world. And as long as they do, we can't ship a package which contains filenames with non-ASCII characters in it.

@waylan
Copy link
Member

waylan commented Jun 29, 2018

In #1504 we introduce a tempdir decorator to use for testing. It makes it relatively simple to setup and tear down temp dirs and files on a test-by-test basis. I'm wondering if that would make for a better way to test this stuff than trying to include the test files in the distribution. Presumably it would also eliminate the problem encountered in #1434.

My only concern is that we are using Python to create the Unicode filennames and then using Python to read those same files. If the same bug exists on both the reading and writing side of the test, we may not detect a problem. In contrast, the current test files were manually created and are known to be valid. That said, Python is copying those files when they are included in the distribution, so I suppose the same risk exists regardless. The benefit of using temp files, is that we have full control over the entire process.

@mitya57
Copy link
Contributor Author

mitya57 commented Jun 30, 2018

Yes, I think using temporary files for this purpose makes sense.

And indeed, __file__ is unreliable in Python 2, and depends on a way how you call it:

$ echo "print(__file__)" > mod.py
$ echo "import mod" > main.py
$ python2 ./mod.py 
./mod.py
$ python2 ./main.py 
/tmp/baz/mod.py
$ python2 -c "import mod"
mod.pyc
$ python2 -c "import main"
mod.pyc
$ python2 -m mod
/tmp/baz/mod.py
$ python2 -m main
mod.pyc

Using temporary files in tests should make the code a bit more reliable and easier to read. If you have doubts you can verify manually that the created files are valid.

waylan added a commit to waylan/mkdocs that referenced this pull request Nov 6, 2020
We already link to the Deploying for Docs page which includes specific
instructions for various hosts. There is no need to also link to those
hosts in the Getting Started guide.

Closes mkdocs#2210, closes mkdocs#2104, closes mkdocs#1494.
@waylan waylan added this to the 1.2 milestone Nov 9, 2020
@waylan waylan removed this from the 1.2 milestone Apr 8, 2021
@oprypin oprypin closed this Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants