Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Website] Repository size is needlessly large #463

Open
bkmgit opened this issue Jan 9, 2024 · 2 comments
Open

[Website] Repository size is needlessly large #463

bkmgit opened this issue Jan 9, 2024 · 2 comments

Comments

@bkmgit
Copy link
Contributor

bkmgit commented Jan 9, 2024

The repository is quite large, 2Gb for the contents. Possible things to reduce the size:

@jorisvandenbossche
Copy link
Member

The biggest chunk of this 2GB content is the docs (around 1.6 GB), which is mostly because we keep several versions, and per version it is around 100MB (and this also continuously increases with every version, because we expand our docs).

We already don't keep the Java reference for older versions because this is too big. We could do something similar for the Python and cpp reference docs.
In general we could also trim down the number of versions we keep for older docs.

It's probably also a good idea to clean up the /docs/dev/ docs once completely (eg I see that in the R docs it accumulated multiple versions of bootstrap, probably because we always overwrite what exists and don't replace), although that's probably not that much space.

When generating the branch asf-site, keep only the latest snapshot, remove previous commits during the CI site generation job

That's certainly responsible for quite some of the size as well. We however do sometimes manually commit to the asf-site branch as well, so ideally we could keep those commits.
I am wondering if it would be possible to only do that for a subdirectory, like the docs/dev/ that get updated from nightly CI. If we could remove the history for just that subdirectory (I don't know if git easily allows this), I think that would already give a large chunk of the benefit.

@bkmgit
Copy link
Contributor Author

bkmgit commented Jan 16, 2024

Can look at this once #449 is done. Moving the older versions to GitLFS seems reasonable, perhaps keeping only the last 4 versions. It is nice to have the older documentation, especially since version updates are relatively frequent, but they are unlikely to change much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants