Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel: improved lazy lfs files #16393

Merged
merged 6 commits into from
May 3, 2024
Merged

Bazel: improved lazy lfs files #16393

merged 6 commits into from
May 3, 2024

Conversation

redsun82
Copy link
Contributor

@redsun82 redsun82 commented May 2, 2024

This reintroduces lazy lfs file rules that were removed in #16117, now improved.

The new rules will make the actual file download go through bazel's download manager, which includes:

  • caching into the repository cache
  • sane limiting of concurrent downloads
  • retries

The bulk of the work is done by git_lfs_probe.py, which will use the LFS protocolq to output short lived download URLs that can be consumed by repository_ctx.download.

This reintroduces lazy lfs file rules that were removed in
#16117, now improved.

The new rules will make the actual file download go through bazel's
download manager, which includes:
* caching into the repository cache
* sane limiting of concurrent downloads
* retries

The bulk of the work is done by `git_lfs_probe.py`, which will use the
LFS protocol (with authentication via SSH) to output short lived
download URLs that can be consumed by `repository_ctx.download`.
@redsun82 redsun82 requested a review from criemen May 2, 2024 04:40
@redsun82 redsun82 requested a review from a team as a code owner May 2, 2024 04:40
@redsun82
Copy link
Contributor Author

redsun82 commented May 2, 2024

This is shown to work on this internal PR

Copy link
Collaborator

@criemen criemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major question, I'll have a more detailed look at the rest of this later, but I don't think there's gonna be other big questions coming.

misc/bazel/internal/git_lfs_probe.py Show resolved Hide resolved
Copy link
Collaborator

@criemen criemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments - what I don't understand yet is if we now still replace the LFS pointers on the source file system with the instantiated files.
As far as I understood, the previous iteration of the LFS rules did exactly that, therefore easily allowing developers to inspect/replace the lazy LFS files that were part of the build process. Is that still the case now?

misc/bazel/internal/git_lfs_probe.py Show resolved Hide resolved
misc/bazel/internal/git_lfs_probe.py Show resolved Hide resolved
misc/bazel/lfs.bzl Show resolved Hide resolved
misc/bazel/lfs.bzl Show resolved Hide resolved
@redsun82
Copy link
Contributor Author

redsun82 commented May 3, 2024

Some more comments - what I don't understand yet is if we now still replace the LFS pointers on the source file system with the instantiated files. As far as I understood, the previous iteration of the LFS rules did exactly that, therefore easily allowing developers to inspect/replace the lazy LFS files that were part of the build process. Is that still the case now?

Yes, and better than before. Previously, the non-LFS pointer files would still go through a smudge filter, which meant the files would be copied. Now:

  • git_lfs_probe.py will detect a non-LFS pointer file and output local for those
  • the bazel rule counterparts will interpret that to
    • symlink the file in case of lfs_files rules, or
    • call repository_ctx.extract directly on the file in case of lfs_archive ones.

@criemen
Copy link
Collaborator

criemen commented May 3, 2024

Ah, but that's the other way, no?

I meant:

  • The user hasn't downloaded the LFS file yet, so on-disk in the source tree it's a LFS pointer
  • The user builds with bazel, bazel acquires the file from LFS
  • After the build, is the file in the source tree still a LFS pointer, or has it been replaced with the real file?

I thought in the previous iteration of these rules, pointers were replaced by actual files, whereas now, that doesn't seem to be the case (anymore?).

@redsun82
Copy link
Contributor Author

redsun82 commented May 3, 2024

Ah, but that's the other way, no?

I meant:

  • The user hasn't downloaded the LFS file yet, so on-disk in the source tree it's a LFS pointer
  • The user builds with bazel, bazel acquires the file from LFS
  • After the build, is the file in the source tree still a LFS pointer, or has it been replaced with the real file?

I thought in the previous iteration of these rules, pointers were replaced by actual files, whereas now, that doesn't seem to be the case (anymore?).

Ah, sorry, I didn't understand initially. That mode of operation (replacing pointer files) was one temporary iteration of how I had implemented it at some point, but it didn't stick 🙂. The final version currently present on semmle-code doesn't do that (see here). Generally changing the source tree during the build should be avoided

Copy link
Collaborator

@criemen criemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the in-depth discussion!

@redsun82 redsun82 merged commit 880262d into main May 3, 2024
16 checks passed
@redsun82 redsun82 deleted the redsun82/lfs branch May 3, 2024 10:37
redsun82 added a commit that referenced this pull request May 6, 2024
This reverts commit d7ecaae.

Problems with lazy LFS rules were solved by
#16393 and
#16434.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants