Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatically improve cloning speed for contributors #4329

Closed
zackkrida opened this issue May 14, 2024 · 1 comment · Fixed by #4432
Closed

Dramatically improve cloning speed for contributors #4329

zackkrida opened this issue May 14, 2024 · 1 comment · Fixed by #4432
Assignees
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🔍 ov: meta Issue spans multiple repos and has sub-issues 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: documentation Related to Sphinx documentation

Comments

@zackkrida
Copy link
Member

zackkrida commented May 14, 2024

Description

I did some analysis on the size of the repository, originally intended to address concerns about the size of our frontend snapshots, that led to some insights. First, a breakdown of the current repo sizing:

  • Downloaded repo size: 1.4GB
  • Side of repo .git directory: 1.3GB

It also took over 2 minutes to download on my 120 Mbps wireless connection.

Clearly, the history and metadata of the repository are the main contributors to the download size. Additionally, there aren't really any large blobs in particular that we benefit from removing. We simply have a lot of history with a lot of files.

I think for most contributors we should recommend doing a "Partial blobless clone" of the repository using the flag --filter=blob:none like so:

git clone --filter=blob:none https://github.com/wordpress/openverse.git
# or gh repo clone wordpress/openverse -- --filter=blob:none 

This results in the following sizing:

  • Downloaded repo size: 183MB
  • Side of repo .git directory: 79M

with a 15 second download time on my 120 Mbps wireless connection.

You can learn more about blobless clones here: https://gist.github.com/leereilly/1f4ea46a01618b6e34ead76f75d0784b#blobless-clones

It basically means that all of the metadata of past commits are present, but not the actual files (blobs). Those will be downloaded on-demand when running git blame or git checkout of a previous commit.

I think we can recommend this strategy to most users in our documentation, and significantly improve their experience.

@zackkrida zackkrida added 🟧 priority: high Stalls work on the project or its dependents 🛠 goal: fix Bug fix 🤖 aspect: dx Concerns developers' experience with the codebase 🔍 ov: meta Issue spans multiple repos and has sub-issues 🧱 stack: documentation Related to Sphinx documentation labels May 14, 2024
@zackkrida zackkrida self-assigned this May 15, 2024
@sarayourfriend
Copy link
Contributor

By the way @zackkrida I've included this approach in #4343. It's awesome! I've cloned the repository several times in testing the ov bootstrap method and it's so much faster this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🛠 goal: fix Bug fix 🔍 ov: meta Issue spans multiple repos and has sub-issues 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: documentation Related to Sphinx documentation
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants