Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building reproducible tarballs #542

Open
bollwyvl opened this issue Apr 20, 2022 · 9 comments
Open

building reproducible tarballs #542

bollwyvl opened this issue Apr 20, 2022 · 9 comments

Comments

@bollwyvl
Copy link

Over on ipython, @Carreau has been using the retar script to post-process sdist tarballs to be SOURCE_DATE_EPOCH-aware.

As it has no dependencies, so if all the licensing is copacetic, what about adopting that behavior to complement the whl-based ones in flit?

Noted on jupyterhub/team-compass#502 (comment)

@takluyver
Copy link
Member

It should be more or less reproducible anyway - we're using SOURCE_DATE_EPOCH and normalising file ownership and permissions when creating the sdist. But I don't think it's particularly easy to test this automatically (because you want to check that the results are the same across things like different platforms), so there may well be inconsistencies that have crept in. Fixes welcome!

def clean_tarinfo(ti, mtime=None):
"""Clean metadata from a TarInfo object to make it more reproducible.
- Set uid & gid to 0
- Set uname and gname to ""
- Normalise permissions to 644 or 755
- Set mtime if not None
"""
ti = copy(ti)
ti.uid = 0
ti.gid = 0
ti.uname = ''
ti.gname = ''
ti.mode = common.normalize_file_permissions(ti.mode)
if mtime is not None:
ti.mtime = mtime
return ti

source_date_epoch = os.environ.get('SOURCE_DATE_EPOCH', '')
mtime = int(source_date_epoch) if source_date_epoch else None

@gitpushdashf
Copy link

Looks like with SOURCE_DATE_EPOCH set, the tarball and wheel are both reproducible. And they match whether flit build or python -m build are used. Very nice!

@bollwyvl
Copy link
Author

Very nice indeed, thanks for looking into it.

So perhaps all that's needed is a note about that, e.g.

Wheels built by flit are reproducible... wheels (which are zip files) include the modification...

amended to

_Wheels and source distributions built by flit are reproducible... wheels (which are .zip archives) and source distributions (which are tar.gz archives) include the modification _

Though have a re-build step on a different os/container might be interesting. I have found that windows has... problems.

@takluyver
Copy link
Member

They won't be reliably reproducible between flit build and python -m build, because the former uses information from git or hg to decide what files to include, while the latter doesn't do that. There's more discussion about that discrepancy as part of #522.

@pradyunsg
Copy link
Member

Note that you use one of those specifically (either build or flit), the distributions generated will be reproducible.

@pradyunsg
Copy link
Member

pradyunsg commented Apr 22, 2022

Given that source tarballs built by flit are reproducible already, is there anything actionable here?

Update: yes, a documentation update. :)

@pradyunsg
Copy link
Member

Ah, nvm me, I need to read things more carefully. 😅

@nanonyme
Copy link
Contributor

I thought the main source of OS reproducibility issues was undefined file emitting order from directories which you just have to mitigate by sorting your input files by filenames.

@takluyver
Copy link
Member

Yup, and we should be ensuring things are sorted, e.g.:

# Ensure we sort all files and directories so the order is stable
for dirpath, dirs, files in os.walk(str(self.path)):
for file in sorted(files):
full_path = os.path.join(dirpath, file)
if _include(full_path):
yield full_path
dirs[:] = [d for d in sorted(dirs) if _include(d)]

for dirpath, dirs, files in os.walk(data_directory):
for file in sorted(files):
full_path = os.path.join(dirpath, file)
yield full_path
dirs[:] = [d for d in sorted(dirs) if d != '__pycache__']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants