Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs with > 2 GB pack files #634

Merged
merged 2 commits into from Jul 28, 2021
Merged

Conversation

filipnavara
Copy link
Member

Unfortunately it's non-trivial to make a test for this since it requires really large data. The obvious bug is the incorrect mask in the .idx file reading which resulted in incorrect offset being read. The other two bugs happen if a delta is crossing the 2GB boundary in the pack file.

@filipnavara filipnavara changed the base branch from master to v3.4 July 28, 2021 10:12
@@ -128,7 +128,7 @@ public override (long?, GitObjectId?) GetOffset(Span<byte> objectName, bool ends
{
// If the first bit of the offset address is set, the offset is stored as a 64-bit value in the table of 8-byte offset entries,
// which follows the table of 4-byte offset entries: "large offsets are encoded as an index into the next table with the msbit set."
offset = offset & 0x7FF;
offset = offset & 0x7FFFFFFF;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: https://github.com/go-git/go-git/blob/master/plumbing/format/idxfile/idxfile.go#L146
Ref: https://git-scm.com/docs/pack-format

If the large offset chunk exists and the 31st bit is on, then removing that bit reveals the row in the large offsets containing the 8-byte offset of this object.

{
int offset = -1;
long offset = -1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: https://git-scm.com/docs/pack-format

Observation: length of each object is encoded in a variable length format and is not constrained to 32-bit or anything.

Example: Base offset is 4 000 000 000, delta relative offset would be 3 000 000 000. The computed delta offset would thus be 1 000 000 000. The number 3 000 000 000 must fit in the data type returned here.

@AArnott AArnott added this to the v3.4 milestone Jul 28, 2021
@AArnott AArnott added the bug label Jul 28, 2021
Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying the bug and especially for offering the fix!
@qmfrederik any other thoughts?

src/NerdBank.GitVersioning/ManagedGit/GitPackReader.cs Outdated Show resolved Hide resolved
Co-authored-by: Andrew Arnott <andrewarnott@gmail.com>
@qmfrederik
Copy link
Contributor

Looks good to me, thanks for fixing this!

@filipnavara
Copy link
Member Author

Thanks for reviews! It was the last blocker for the managed GIT backend on our huge and unusual repository (where even libgit2 sometimes struggles).

@AArnott AArnott merged commit fa7c948 into dotnet:v3.4 Jul 28, 2021
@AArnott
Copy link
Collaborator

AArnott commented Jul 28, 2021

This is shipping to nuget.org as 3.4.231.

@filipnavara
Copy link
Member Author

This is shipping to nuget.org as 3.4.231.

Thanks! We finally updated internally to the managed GIT implementation. It was a long journey but I am really happy about it.

@filipnavara filipnavara deleted the large-pack-offsets branch July 28, 2021 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants