Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up contents from previously-sb tarball post binary-cleanup #19632

Conversation

omajid
Copy link
Member

@omajid omajid commented May 1, 2024

If we extracted the previously-source-build tarball only for running the binary cleanup tool, clean up the extracted contents. This undo's the extraction operation that was only performed for running the binary-cleanup tool.

I am creating a tarball for building .NET offline post-prep. In that scenario, I run ./prep-source-build.sh --bootstrap, and then package up the entire VMR directory into a tarball. Without this cleanup, there are a ton of duplicated binaries in the tarball, since the nuget packages are both in previously-source-built archive and in an extracted form. With this change, the archive size goes from just over 4GB to just over 2GB.

@omajid omajid requested review from a team as code owners May 1, 2024 22:56
@omajid
Copy link
Member Author

omajid commented May 1, 2024

One thing I am a little confused about: when I use --bootstrap, I get no new packages in the archive directory. Is that expected?

$ ./prep-source-build.sh  --bootstrap
  Installing dotnet...
Downloading 'https://dotnet.microsoft.com/download/dotnet/scripts/v1/dotnet-install.sh'
  Attempting to install 'sdk v9.0.100-preview.4.24223.7' from public_location.
dotnet-install: Attempting to download using primary link https://dotnetcli.azureedge.net/dotnet/Sdk/9.0.100-preview.4.24223.7/dotnet-sdk-9.0.100-preview.4.24223.7-linux-x64.tar.gz
curl: (22) The requested URL returned error: 404
dotnet-install: The resource at primary link 'https://dotnetcli.azureedge.net/dotnet/Sdk/9.0.100-preview.4.24223.7/dotnet-sdk-9.0.100-preview.4.24223.7-linux-x64.tar.gz' is not available.
dotnet-install: Attempting to download using primary link https://dotnetbuilds.azureedge.net/public/Sdk/9.0.100-preview.4.24223.7/dotnet-sdk-9.0.100-preview.4.24223.7-linux-x64.tar.gz
dotnet-install: Remote file https://dotnetbuilds.azureedge.net/public/Sdk/9.0.100-preview.4.24223.7/dotnet-sdk-9.0.100-preview.4.24223.7-linux-x64.tar.gz size is 237074596 bytes.
dotnet-install: Extracting archive from https://dotnetbuilds.azureedge.net/public/Sdk/9.0.100-preview.4.24223.7/dotnet-sdk-9.0.100-preview.4.24223.7-linux-x64.tar.gz
dotnet-install: Downloaded file size is 237074596 bytes.
dotnet-install: The remote and local file sizes are equal.
dotnet-install: Installed version is 9.0.100-preview.4.24223.7
dotnet-install: Adding to current process PATH: `dotnet/.dotnet`. Note: This change will be visible only when sourcing script.
dotnet-install: Note that the script does not resolve dependencies during installation.
dotnet-install: To check the list of dependencies, go to https://learn.microsoft.com/dotnet/core/install, select your operating system and check the "Dependencies" section.
dotnet-install: Installation finished successfully.
  Looking for source-built Artifacts to download...
  Downloading source-built Artifacts from https://dotnetcli.azureedge.net/source-built-artifacts/assets/Private.SourceBuilt.Artifacts.9.0.100-preview.4.24223.1.centos.9-x64.tar.gz...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  760M  100  760M    0     0  6107k      0  0:02:07  0:02:07 --:--:-- 7144k
  Building bootstrap previously source-built in /tmp/tmp.dB8wTXe1G3
  Retrieving PackageVersions.props from existing archive
Restore complete (83.5s)

Build succeeded in 83.8s
  Looking for source-built Prebuilts to download...
  No source-built Prebuilts found to download...
  Unpacking Private.SourceBuilt.Artifacts.*.tar.gz into dotnet/prereqs/packages/previously-source-built
22:40:29 info: BinaryTool[0] Starting binary tool at 2024-05-01 6:40:29 p.m. in Clean mode
22:40:29 info: BinaryTool[0] Creating directory 'dotnet/artifacts/log/binary-report' for 'OutputReportDirectory'.
22:40:29 info: BinaryTool[0] Detecting binaries in 'dotnet' not listed in 'dotnet/eng/allowed-sb-binaries.txt'...
22:40:37 info: BinaryTool[0]     Updated allowed binaries file 'allowed-sb-binaries.txt' written to 'dotnet/artifacts/log/binary-report/Updatedallowed-sb-binaries.txt'
22:40:37 info: BinaryTool[0] Finished binary detection.
22:40:37 info: BinaryTool[0] Removing binaries from 'dotnet'...
22:40:37 info: BinaryTool[0] Finished binary removal. Removed 0 binaries.
22:40:37 info: BinaryTool[0] Finished all binary tasks. Took 7.3028168 seconds.

$ find ./prereqs/packages/archive
./prereqs/packages/archive
./prereqs/packages/archive/_
./prereqs/packages/archive/Private.SourceBuilt.Artifacts.Bootstrap.tar.gz

@ellahathaway
Copy link
Member

ellahathaway commented May 1, 2024

One thing I am a little confused about: when I use --bootstrap, I get no new packages in the archive directory. Is that expected?

I think it's because the rebootstrapping uses the working dir?

I'm not sure if that can be used for this tool. If it can, that feels like the better solution consistency-wise. Otherwise, the changes you've made LGTM. Thanks for fixing this, Omair!

Copy link
Member

@ellahathaway ellahathaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending my one comment. Thanks!

@omajid
Copy link
Member Author

omajid commented May 2, 2024

I think it's because the rebootstrapping uses the working dir?

I'm not sure if that can be used for this tool. If it can, that feels like the better solution consistency-wise.

Bootstrap is an optional process. So we will need a code path for binary-cleanup that works without it anyway. I think we should keep this simple and always assume bootstrap was not run.

@omajid omajid force-pushed the prep-remove-extracted-tarball-packages-post-binary-cleanup branch from a7bcabf to e3da9c4 Compare May 2, 2024 13:49
@ellahathaway
Copy link
Member

Bootstrap is an optional process. So we will need a code path for binary-cleanup that works without it anyway. I think we should keep this simple and always assume bootstrap was not run.

Sorry, I wasn't clear with what I meant. I was suggesting that the binary tool should follow the same pattern that the bootstrapping tool uses with the working_dir, not reuse the working_dir from the bootstrapping. That is, the binary tool would create a temporary working_dir that gets cleaned up after.

That said, it's a very similar solution to what you have already included in this PR. I'm personally fine with using the "extracted_tarball" switch as you have done.

If we extracted the previously-source-build tarball only for running the
binary cleanup tool, clean up the extracted contents. This undo's the
extraction operation that was only performed for running the
binary-cleanup tool.

I am creating a tarball for building .NET offline post-prep. In that
scenario, I run `./prep-source-build.sh --bootstrap`, and then package
up the entire VMR directory into a tarball. Without this cleanup, there
are a ton of duplicated binaries in the tarball, since the nuget
packages are both in previously-source-built archive and in an extracted
form. With this change, the archive size goes from just over 4GB to just
over 2GB.
@omajid omajid force-pushed the prep-remove-extracted-tarball-packages-post-binary-cleanup branch from e3da9c4 to f92d698 Compare May 2, 2024 18:25
@omajid
Copy link
Member Author

omajid commented May 6, 2024

I have updated this PR to use a temporary working directory for extracting files before running the binary tool. And like the other usage of working directory, it is cleaned up.

Copy link
Member

@ViktorHofer ViktorHofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the installer->sdk repository consolidation we are already switching the VMR over into dotnet/sdk. ETA: today or tomorrow. The last manual code sync happened today and we don't plan on doing another sync. Please close this PR and submit it into dotnet/sdk. Sorry for the inconvenience.

@omajid
Copy link
Member Author

omajid commented May 8, 2024

Re-opened as dotnet/sdk#40772

@omajid omajid closed this May 8, 2024
omajid added a commit to omajid/dotnet-sdk that referenced this pull request May 8, 2024
If we extracted the previously-source-build tarball only for running the
binary cleanup tool, clean up the extracted contents. This undo's the
extraction operation that was only performed for running the
binary-cleanup tool.

I am creating a tarball for building .NET offline post-prep. In that
scenario, I run ./prep-source-build.sh --bootstrap, and then package up
the entire VMR directory into a tarball. Without this cleanup, there are
a ton of duplicated binaries in the tarball, since the nuget packages
are both in previously-source-built archive and in an extracted form.
With this change, the archive size goes from just over 4GB to just over
2GB.

This is a port of dotnet/installer#19632 from
installer repo to sdk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants