Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto track binary files #828
Auto track binary files #828
Changes from 2 commits
a4e9b19
35b480c
57f8613
b8bbe4f
17a1331
5f48d6c
6d27a15
9fe6bb6
cbfdce5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed above here: #828 (comment)
Do you disagree with the conclusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see that. Yeah I don't think we should ever read 11GB of data into memory. This will most certainly crash most people's systems. I'd be happy if we read like 10MB instead of 1MB to reduce the probability of a false detection, which should address those concerns. If we really do want to read a lot more, we should read in chunks. As python's docs state:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution currently implemented loads a maximum of 10MB in memory when calling
git_add
: it tracks large files before tracking binary files.When tracking binary files, it looks at files which are not yet tracked with lfs, eliminating large files.
Instead of the 1MB limit that you propose here, we could instead put a max of 10MB here, which will only be triggered when
auto_track_binary_files
is called independently ofgit_add
(which is a possibility!).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but you also want to have this method public, which means people can call it before having tracked large files.
I'm happy with your suggestion of 10MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for your review. Addressed in cbfdce5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't write on the working dir, we should work with python's temp files and folders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
WORKING_REPO_DIR
is a folder that is created on the start of the test, and which is reset at every beginning of every test, see here:huggingface_hub/tests/test_repository.py
Lines 76 to 91 in 6d27a15
This isn't the working directory in which the test is run, but a folder nested in the
fixtures
folder:huggingface_hub/tests/test_repository.py
Lines 50 to 52 in 6d27a15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as above :)