Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Reproducible vendor directory contents #9768

Closed
glensc opened this issue Mar 10, 2021 · 17 comments
Closed

RFC: Reproducible vendor directory contents #9768

glensc opened this issue Mar 10, 2021 · 17 comments
Labels
Milestone

Comments

@glensc
Copy link
Contributor

glensc commented Mar 10, 2021

There has been an issue (and now implemented) to make composer.phar build reproducible:

I'm looking the same thing for vendor reproducibility.

The final end goal is to produce identical vendor/ from composer.lock if invoked from CI
and later copied to docker layer. The assumption is that CI can't share the previous workspace,
so the timestamp information must come from somewhere else.

What I've researched so far, is that:

  1. Let's look at "composer/ca-bundle@1.2.9",
  2. The tag is created at 2021-01-12T12:15:05Z (Used inspect in Chrome)
  3. GitHub zip files contain timestamps for files/directories. example: "01-12-2021 12:10 UTC"
  4. different installations sources like Git, or a Directory may lack (consistent) timestamp info or is too expensive to calculate (traversing git history)
  5. composer.lock writes date for installations, so that could be used for directories as fallback. example: "2021-01-12T12:10:35+00:00"
  6. if the package source lacks timestamp, use the install date from same source as composer.lock
  7. Not all timestamps match, but they are close.
@glensc
Copy link
Contributor Author

glensc commented Mar 10, 2021

Looked even more closely at that release:

  1. Git lightweight tag is created at 2021-01-12 12:10:35+00:00
    (LC_ALL=C TZ=UTC date -d @$(git show 1.2.9 --format=%at) --rfc-3339=seconds)
  2. GitHub release was created 5 minutes later 2021-01-12T12:15:05Z (developer took some time to test locally and push? :)
  3. The composer.lock uses timestamp that matches Git tag commit date.
  4. GitHub reeases api shows
    created_at: "2021-01-12T12:10:35Z", published_at: "2021-01-12T12:15:05Z",
    (https://api.github.com/repos/composer/ca-bundle/releases)

glensc added a commit to eventum/docker that referenced this issue Mar 10, 2021
@glensc
Copy link
Contributor Author

glensc commented Mar 10, 2021

as a proof of concept, updated Dockerfile to touch everything in vendor directory:

this will build a vendor layer cached on any system. requires docker with buildkit.

# checkout!
git clone -b v3.9.12 https://github.com/eventum/docker
cd docker

# ensure pristine state
docker builder prune -fa
docker image prune -fa

# build!
CACHE_TAG=eventum/eventum:3.9.12
IMAGE_TAG=reproducible-vendor

DOCKER_BUILDKIT=1 \
docker build \
	--build-arg=BUILDKIT_INLINE_CACHE=1 \
	--tag=$IMAGE_TAG \
	--cache-from=$CACHE_TAG \
	--progress=plain \
	.
$ TZ=UTC docker history reproducible-vendor --human=false | grep vendor
<missing>           2021-03-10T19:56:36Z   COPY /vendor ./vendor/ # buildkit               21635944            buildkit.dockerfile.v0

when the vendor layer is created, you will see in the build log pulling and cached:

#22 [stage-2 3/6] COPY --from=source /vendor ./vendor/
#22 pulling sha256:e304175777ecf27e5514d5dadef5bc49a971485fe772cb10beb37a3e49b972cf
#22 pulling sha256:58b8a59e9a27f85d16a844ab0adf3607bf1e9edbc667d39948e7b79e118573c7
#22 pulling sha256:5168ac08d0247335cad69dcb3930e3dea2d036131c2a32ef47b2f083d99b81f1
#22 pulling sha256:58b8a59e9a27f85d16a844ab0adf3607bf1e9edbc667d39948e7b79e118573c7 0.5s done
#22 pulling sha256:5168ac08d0247335cad69dcb3930e3dea2d036131c2a32ef47b2f083d99b81f1 0.7s done
#22 pulling sha256:e304175777ecf27e5514d5dadef5bc49a971485fe772cb10beb37a3e49b972cf 1.1s done
#22 CACHED

@Seldaek
Copy link
Member

Seldaek commented Mar 11, 2021

We'd have to see what the runtime impact is of touching all files after installation.. That probably adds a bunch of IO for smth that is kind of an edge case. I'm not sure if worth it.

Is it needed though? From what you described above, it seems like GitHub already sets the filemtime correctly for all the archives to match the time of the tag, which is what Composer also sees as the release date. So is there any change needed?

@glensc
Copy link
Contributor Author

glensc commented Mar 11, 2021

The change could be made to the paths not present in the archives:

  • vendor/VENDOR/PACKAGE
  • vendor/VENDOR
  • vendor/composer/*
  • vendor/composer
  • vendor/autoload.php

for Docker caching to work, the vendor dir needs to be reproducible 100%, not 99% or 99.999997% or the cache hit won't happen.

@Seldaek
Copy link
Member

Seldaek commented Mar 11, 2021

Right I fully understand the 100%.

I believe we could apply the lock file's mtime to all the files you described. That shouldnt take too much io and should work reasonably well. If no lock file is present we can simply not do this, i don't expect anyone without lock file to care about this, as without lock file it's not going to be reproducible anyway.

@Seldaek Seldaek added this to the 2.1 milestone Mar 11, 2021
@glensc
Copy link
Contributor Author

glensc commented Mar 11, 2021

I've created a gist, for this to play around easier:

This comment corresponds to git tag v1, I may change the repo in the future:

git clone -b v1 https://gist.github.com/glensc/b51e9a8c180f3ca3ca1a6593931d431d composer-feat-9768
cd composer-feat-9768

@glensc
Copy link
Contributor Author

glensc commented Mar 11, 2021

With v2, everything in vendor is ran via touch(1), to emulate expected behavior:

git clone -b v2 https://gist.github.com/glensc/b51e9a8c180f3ca3ca1a6593931d431d composer-feat-9768
cd composer-feat-9768
./buildkit
...

however, it does not seem to work today. the vendor layer is still taken from local, not registry cache. should be all old as WORKDIR /app (6m in this example)

$ docker history docker.io/glen/composer-feat-9768:v3
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
c22181e144de        26 seconds ago      COPY /app/vendor ./vendor # buildkit            267kB               buildkit.dockerfile.v0
<missing>           6 minutes ago       WORKDIR /app                                    0B                  buildkit.dockerfile.v0
<missing>           3 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           3 weeks ago         /bin/sh -c #(nop) ADD file:80bf8bd014071345b…   5.61MB

Maybe need to consult with docker dudes about how this works...

@williamdes
Copy link

I am researching to make the phpMyAdmin builds reproductible.

As you can see the composer diffs are quite small in size.
Any idea on how I can get all this to not differ ?

diff composer-workstation/autoload_real.php composer-snap-srv/autoload_real.php
5c5
< class ComposerAutoloaderInit1b17b2e8ddd55ea00091a3ad135fce66
---
> class ComposerAutoloaderInit23b49f0010eed1c13908880297e27e4f
27c27
<         spl_autoload_register(array('ComposerAutoloaderInit1b17b2e8ddd55ea00091a3ad135fce66', 'loadClassLoader'), true, true);
---
>         spl_autoload_register(array('ComposerAutoloaderInit23b49f0010eed1c13908880297e27e4f', 'loadClassLoader'), true, true);
29c29
<         spl_autoload_unregister(array('ComposerAutoloaderInit1b17b2e8ddd55ea00091a3ad135fce66', 'loadClassLoader'));
---
>         spl_autoload_unregister(array('ComposerAutoloaderInit23b49f0010eed1c13908880297e27e4f', 'loadClassLoader'));
35c35
<             call_user_func(\Composer\Autoload\ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66::getInitializer($loader));
---
>             call_user_func(\Composer\Autoload\ComposerStaticInit23b49f0010eed1c13908880297e27e4f::getInitializer($loader));
56c56
<             $includeFiles = Composer\Autoload\ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66::$files;
---
>             $includeFiles = Composer\Autoload\ComposerStaticInit23b49f0010eed1c13908880297e27e4f::$files;
61c61
<             composerRequire1b17b2e8ddd55ea00091a3ad135fce66($fileIdentifier, $file);
---
>             composerRequire23b49f0010eed1c13908880297e27e4f($fileIdentifier, $file);
68c68
< function composerRequire1b17b2e8ddd55ea00091a3ad135fce66($fileIdentifier, $file)
---
> function composerRequire23b49f0010eed1c13908880297e27e4f($fileIdentifier, $file)
diff composer-workstation/autoload_static.php composer-snap-srv/autoload_static.php
7c7
< class ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66
---
> class ComposerStaticInit23b49f0010eed1c13908880297e27e4f
1682,1685c1682,1685
<             $loader->prefixLengthsPsr4 = ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66::$prefixLengthsPsr4;
<             $loader->prefixDirsPsr4 = ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66::$prefixDirsPsr4;
<             $loader->prefixesPsr0 = ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66::$prefixesPsr0;
<             $loader->classMap = ComposerStaticInit1b17b2e8ddd55ea00091a3ad135fce66::$classMap;
---
>             $loader->prefixLengthsPsr4 = ComposerStaticInit23b49f0010eed1c13908880297e27e4f::$prefixLengthsPsr4;
>             $loader->prefixDirsPsr4 = ComposerStaticInit23b49f0010eed1c13908880297e27e4f::$prefixDirsPsr4;
>             $loader->prefixesPsr0 = ComposerStaticInit23b49f0010eed1c13908880297e27e4f::$prefixesPsr0;
>             $loader->classMap = ComposerStaticInit23b49f0010eed1c13908880297e27e4f::$classMap;

@glensc
Copy link
Contributor Author

glensc commented Mar 23, 2021

I am researching to make the phpMyAdmin builds reproducible.

As you can see the composer diffs are quite small in size.
Any idea on how I can get all this to not differ?

@williamdes, yes:

@williamdes
Copy link

I am researching to make the phpMyAdmin builds reproducible.
As you can see the composer diffs are quite small in size.
Any idea on how I can get all this to not differ?

@williamdes, yes:

is there something bad in setting a non random value?

@stof
Copy link
Contributor

stof commented Mar 23, 2021

Well, as long as you set a value specific to your project (so that it does not cause conflicts with an autoloader from composer global or from elsewhere loaded in the same PHP process), it should be good.
The suffix is there to have unique class names when you have multiple generated autoloaders involved.

@Seldaek Seldaek modified the milestones: 2.1, 2.2 May 26, 2021
@Seldaek
Copy link
Member

Seldaek commented Dec 5, 2021

@glensc have you made any progress on this? Is there any easy step we could take to improve things in Composer?

@Seldaek Seldaek modified the milestones: 2.2, 2.3 Dec 7, 2021
@Seldaek Seldaek modified the milestones: 2.3, 2.4 Feb 18, 2022
@AlbinoDrought
Copy link

AlbinoDrought commented Mar 8, 2022

I'm also interested in vendor/ reproducibility. I get 99% of the way there with composer config autoloader-suffix SomeCommonSuffix and strip-nondeterminism.

The problematic area in my case is vendor/composer/autoload_*.php. Unfortunately, and I haven't been able to reliably reproduce this, machines sometimes end up with a different order in autoload_files.php and auotload_static.php for the same composer version / composer.json / composer.lock:

zipcmp local.zip remote.zip
--- local.zip
+++ remote.zip
- file 'vendor/composer/autoload_files.php', size 3829, crc cce2eb8b
+ file 'vendor/composer/autoload_files.php', size 3829, crc 14175282
- file 'vendor/composer/autoload_static.php', size 131121, crc 854e2188
+ file 'vendor/composer/autoload_static.php', size 131121, crc a709a627
--- local/vendor/composer/autoload_files.php	1969-12-31 16:00:00.000000000 -0800
+++ remote/vendor/composer/autoload_files.php	1969-12-31 16:00:00.000000000 -0800
@@ -29,8 +29,8 @@
     '8a9dc1de0ca7e01f3e08231539562f61' => $vendorDir . '/aws/aws-sdk-php/src/functions.php',
     'def43f6c87e4f8dfd0c9e1b1bab14fe8' => $vendorDir . '/symfony/polyfill-iconv/bootstrap.php',
     '801c31d8ed748cfa537fa45402288c95' => $vendorDir . '/psy/psysh/src/functions.php',
-    '23c18046f52bef3eea034657bafda50f' => $vendorDir . '/symfony/polyfill-php81/bootstrap.php',
     '2c102faa651ef8ea5874edb585946bce' => $vendorDir . '/swiftmailer/swiftmailer/lib/swift_required.php',
+    '23c18046f52bef3eea034657bafda50f' => $vendorDir . '/symfony/polyfill-php81/bootstrap.php',
     'e8aa6e4b5a1db2f56ae794f1505391a8' => $vendorDir . '/amphp/amp/lib/functions.php',
     '76cd0796156622033397994f25b0d8fc' => $vendorDir . '/amphp/amp/lib/Internal/functions.php',
     'e39a8b23c42d4e1452234d762b03835a' => $vendorDir . '/ramsey/uuid/src/functions.php',
--- local/vendor/composer/autoload_static.php	1969-12-31 16:00:00.000000000 -0800
+++ remote/vendor/composer/autoload_static.php	1969-12-31 16:00:00.000000000 -0800
@@ -30,8 +30,8 @@
         '8a9dc1de0ca7e01f3e08231539562f61' => __DIR__ . '/..' . '/aws/aws-sdk-php/src/functions.php',
         'def43f6c87e4f8dfd0c9e1b1bab14fe8' => __DIR__ . '/..' . '/symfony/polyfill-iconv/bootstrap.php',
         '801c31d8ed748cfa537fa45402288c95' => __DIR__ . '/..' . '/psy/psysh/src/functions.php',
-        '23c18046f52bef3eea034657bafda50f' => __DIR__ . '/..' . '/symfony/polyfill-php81/bootstrap.php',
         '2c102faa651ef8ea5874edb585946bce' => __DIR__ . '/..' . '/swiftmailer/swiftmailer/lib/swift_required.php',
+        '23c18046f52bef3eea034657bafda50f' => __DIR__ . '/..' . '/symfony/polyfill-php81/bootstrap.php',
         'e8aa6e4b5a1db2f56ae794f1505391a8' => __DIR__ . '/..' . '/amphp/amp/lib/functions.php',
         '76cd0796156622033397994f25b0d8fc' => __DIR__ . '/..' . '/amphp/amp/lib/Internal/functions.php',
         'e39a8b23c42d4e1452234d762b03835a' => __DIR__ . '/..' . '/ramsey/uuid/src/functions.php',

Not sure if there's a fantastic fix for this already - browsed the code that creates these files (parseAutoloadsType?) and didn't notice any options for sorting.

At a glance, it seems like our issue could be fixed by applying a ksort to the $autoloads array before it is returned from parseAutoloadsType, but I'm not adept enough with the composer codebase to know what else would be impacted by this change.

We're fixing this for our project by sorting arrays in the autoload_*.php files using this external script:

Script of dubious quality
<?php

if (!isset($argv[1])) {
  throw new \Exception('Usage: [path to composer autoload_*.php file');
}

$contents = file_get_contents($argv[1]);

$matches = [];

// autoload_files.php
preg_match('/(return array *\( *)([^\)]+)( *\);)/', $contents, $matches);
if (empty($matches)) {
  // autoload_static.php
  preg_match('/(public static \$files = array *\( *)([^\)]+)( *\);)/', $contents, $matches);
}

if (count($matches) !== 4) {
  throw new \Exception('Expected 4 matches: ' . json_encode($matches));
}

$original = $matches[2];

$newParts = explode(PHP_EOL, $original);
sort($newParts);
$newParts = array_values($newParts);
$new = implode(PHP_EOL, $newParts);

if ($original === $new) {
  echo 'No Change' . PHP_EOL;
  exit(0);
}

$newContents = str_replace($original, $new, $contents);
echo 'Change' . PHP_EOL;

if (file_put_contents($argv[1], $newContents) === false) {
  throw new \RuntimeException('Failed to write new contents');
}

exit(0);

@Seldaek
Copy link
Member

Seldaek commented Mar 29, 2022

@AlbinoDrought I believe/hope that part (files autoload order) was fixed by #10617 in 2.2.8.

@dkarlovi
Copy link

Well, as long as you set a value specific to your project (so that it does not cause conflicts with an autoloader from composer global or from elsewhere loaded in the same PHP process), it should be good.

Couldn't this be solved with defaulting to something non-random like the project name or FILE or something?

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2022

Not all projects have a name, dir may be machine-dependent.. I'm not sure really if this is worth doing in a non-opt-in manner as it is now.

@Seldaek Seldaek modified the milestones: 2.4, 2.5 Jul 17, 2022
@Seldaek
Copy link
Member

Seldaek commented Oct 27, 2022

Closing as I am not sure what to improve right now, if there is a specific action item please open a new issue with details.

@Seldaek Seldaek closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants