Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository file size #512

Open
kalapi opened this issue Dec 18, 2022 · 9 comments
Open

Repository file size #512

kalapi opened this issue Dec 18, 2022 · 9 comments

Comments

@kalapi
Copy link

kalapi commented Dec 18, 2022

Problem description
Hi @arrowtype, this is not a bug report, more like a meta issue with the repository itself. I was doing some winter housekeeping on my Mac and found that the Recursive repository is 3.98 GB! And this isn't because of a file (or group of files) in the current state of the repo but probably because of past commits in the hidden .git folder. Honestly I don't know what these files represent. I wonder if you have any insight?

Expected behavior
No effect on font binaries

Screenshots
Screenshot 2022-12-17 at 5 00 04 PM

To Reproduce
I use a Mac app called Grand Perspective which helps visualise large data blocks on the hard drive. One of the larger chunks was the Recursive repository folder and I was really surprised.

Screenshot 2022-12-17 at 5 04 35 PM

Environment (please complete the following information):

  • OS: macOS Catalina 10.15.7
  • Browser: N/A
  • Fonts: N/A
  • (Pulled all latest commits on main)

Additional context
N/A

@arrowtype
Copy link
Owner

Thanks for the detailed report!

Hmm, one possibility is that it's just many rounds of many UFO font sources, each with many small files.

I'm not really a git wizard. Do you (or anyone else) have any suggestions of what to do to trim down the size of git repos?

@kalapi
Copy link
Author

kalapi commented Dec 18, 2022

I have no idea how to fix this. I'm going to do some reading and try to figure it out.

@kalapi
Copy link
Author

kalapi commented Dec 18, 2022

Okay I found a possible solution. I'm documenting the process here but will make a fork of the repo and try it out there. If everything has worked as desired, I'll open a pull request.

My working hypothesis is that the offending files seems to be

  1. A .sketch file which was prototyping the Noordzij cube (src/proofs/final-specimen/create-noordzij-cube-6_sides.sketch)
  2. A .zip file in the fonts directory containing all the binaries at some point

While the files inherently aren't equal to a couple gigabytes, the deltas that reference them over several commits could be multipliers.

Process:

  • Run sh <path-to>/FindBlobs.sh. This will identify files in history above a certain byte size.

  • Once the files have been identified run the following with the filename:

    • Run git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch <folder/file name>' --prune-empty
  • Once filtering has been completed run all the following commands in sequence:

    • git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
    • git reflog expire --expire=now --all
    • git gc --aggressive --prune=now

Sources:

https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history
https://stackoverflow.com/questions/11050265/remove-large-pack-file-created-by-git
Also try https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-the-git-repository

Contents of FindBlobs.sh

awk command finds file sizes greater than 2^25 bytes (33.554432 MB)

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize:disk) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  awk '$2 >= 2^25'

@kalapi
Copy link
Author

kalapi commented Dec 23, 2022

Hi @arrowtype I tried a couple of things but they didn't work. I think you might need to talk to someone who has a deeper understanding of version control to help out with this.

Looking at this problem, as a matter of policy, we've made a change (internally within Universal Thirst) to not upload font binaries, PDFs, images and other binaries in new repos.

Feel free to close this and apologies for the bother :)

@jkyeung
Copy link

jkyeung commented Feb 4, 2023

Of course I cannot speak for @arrowtype, but if I were the author of this repo, I would not find anything about this thread that was a bother or needs apology. I think many people would be genuinely curious and interested to see if there is something that can be done about the unexpectedly large size.

Edit:

I guess 4 GB including history is not so bad, judging by this document about building a different font:

Note that this repo has a 30+ GB commit history. If you only want the current files and future changes, you can avoid downloading so much by cloning the repo with a --depth limit.

It then goes on to give an example of a limited-depth clone command for that font. So, it would seem the size of the (main, official) Recursive repo is actually not all that unexpected or even unusually large. And maybe the normal thing to do for people who want to play around with a local clone is to not download all the history.

@maphew
Copy link

maphew commented Mar 5, 2023

For people who don't need the whole repo history, which is likely no one outside of @arrowtype, use a shallow clone.

Get only the last 20 commits across all branches:

git clone --depth=20 --no-single-branch https://github.com/arrowtype/recursive.git

Only the most recent commit and only the main branch:

git clone --depth=1 https://github.com/arrowtype/recursive.git

This shallowest clone is best for one-off or consume-only copies, because not all git commands will work later, e.g. switch to a branch created before the cloning date.

Disclaimer: I'm not all that git proficient, but learned this trick out of necessity when working with a 20 year old project with very long clone times.

@arrowtype
Copy link
Owner

Hey, thanks so much for the feedback and insights, @kalapi, @jkyeung, and @maphew! Sorry I've been slow to respond, but I truly appreciate it.

I will try to add the depth tip to the readme, for people that may see it.

I suspect the thing that might take the most time in downloading is that in this repo, I have been committing changes to "build prepped" UFOs, and I have done many, many builds. Each UFO is thousands of tiny files, so I think that probably stacks up a lot. I now put such prepped sources in the .gitignore file of font repos.

I'll try to look further into the resources posted by @kalapi to test out pruning some of those unnecessary sources.

@arrowtype
Copy link
Owner

@maphew you explained things well! Is it alright if I basically copy-paste your advice into the readme?

@maphew
Copy link

maphew commented Mar 6, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants