New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bugs with > 2 GB pack files #634
Conversation
b9f2141
to
357099c
Compare
@@ -128,7 +128,7 @@ public override (long?, GitObjectId?) GetOffset(Span<byte> objectName, bool ends | |||
{ | |||
// If the first bit of the offset address is set, the offset is stored as a 64-bit value in the table of 8-byte offset entries, | |||
// which follows the table of 4-byte offset entries: "large offsets are encoded as an index into the next table with the msbit set." | |||
offset = offset & 0x7FF; | |||
offset = offset & 0x7FFFFFFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref: https://github.com/go-git/go-git/blob/master/plumbing/format/idxfile/idxfile.go#L146
Ref: https://git-scm.com/docs/pack-format
If the large offset chunk exists and the 31st bit is on, then removing that bit reveals the row in the large offsets containing the 8-byte offset of this object.
{ | ||
int offset = -1; | ||
long offset = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref: https://git-scm.com/docs/pack-format
Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything.
Example: Base offset is 4 000 000 000
, delta relative offset would be 3 000 000 000
. The computed delta offset would thus be 1 000 000 000
. The number 3 000 000 000
must fit in the data type returned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for identifying the bug and especially for offering the fix!
@qmfrederik any other thoughts?
Co-authored-by: Andrew Arnott <andrewarnott@gmail.com>
Looks good to me, thanks for fixing this! |
Thanks for reviews! It was the last blocker for the managed GIT backend on our huge and unusual repository (where even libgit2 sometimes struggles). |
This is shipping to nuget.org as 3.4.231. |
Thanks! We finally updated internally to the managed GIT implementation. It was a long journey but I am really happy about it. |
Unfortunately it's non-trivial to make a test for this since it requires really large data. The obvious bug is the incorrect mask in the
.idx
file reading which resulted in incorrect offset being read. The other two bugs happen if a delta is crossing the 2GB boundary in the pack file.