Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest "prevent large objects from being read into memory" commit caused "object not found" error #323

Closed
marguerite opened this issue May 25, 2021 · 13 comments

Comments

@marguerite
Copy link

@zeripath Before applying commit "720c192", these codes works:

_, err := git.PlainClone(pkgDir, false, &git.CloneOptions{
  URL:               "https://github.com/rime/rime-cantonese",
  ReferenceName:     plumbing.NewBranchReferenceName("master"),
  RecurseSubmodules: git.DefaultSubmoduleRecursionDepth,
  Depth:             1})
fmt.Println(err)
nil

After the commit, these codes return a "object not found" error.

Because now the 3mb file "jyut6ping3.dict.yaml" in commit tree object can't not have a blob Reader().

That is, the b.obj.Reader() function in plumbing/object/object.go can't return a valid io.ReadCloser and nil error now.

@marguerite
Copy link
Author

detailed analysis please refer to my comments in #305

@marguerite
Copy link
Author

marguerite commented May 25, 2021

the codes in trouble are:

r, err := p.getReaderDirect(h) // fsobject.go
-> r, err := p.readOFSDeltaObjectContent(h, deltaRC) // getReaderDirect() in packfile.go
->  base, err = p.Get(h.Reference) // readOFSDeltaObjectContent in packfile.go
->  offset, err := p.FindOffset(h)  // func (p *packfile) Get()
->  i, ok := idx.findHashIndex(h) // FindOffset in idxfile.go
->  k := idx.FanoutMapping[h[0]] 

here:

k => -1
h.String() => 0000000000000000000000000000000000000000
k == noMapping => true
h[0] => 0
idx.FanoutMapping =>
[-1 -1 0 -1 1 2 3 4 5 6 -1 -1 -1 -1 -1 7 8 9 -1 -1 10 -1 -1 11 -1 12 -1 -1 13 -1 14 15 -1 16 -1 -1 -1 -1 17 18 19 20 -1 -1 -1 -1 -1 21 22 23 -1 24 -1 -1 25 -1 -1 26 27 28 29 30 -1 -1 -1 31 -1 -1 -1 32 -1 -1 33 34 35 36 37 38 39 40 -1 41 42 43 44 45 -1 46 -1 47 48 -1 49 50 -1 51 -1 -1 -1 52 53 -1 -1 54 -1 -1 55 56 -1 -1 -1 57 -1 58 -1 59 60 61 -1 62 -1 -1 63 64 65 -1 66 -1 -1 67 -1 68 -1 -1 -1 -1 69 70 71 72 -1 73 74 75 76 -1 -1 -1 -1 -1 -1 -1 -1 77 78 -1 -1 -1 -1 79 80 -1 81 -1 -1 -1 -1 -1 82 -1 -1 83 84 -1 -1 -1 85 -1 -1 86 87 88 89 -1 -1 -1 90 91 -1 -1 92 93 94 95 -1 96 97 -1 98 99 100 -1 101 -1 102 -1 103 -1 104 -1 105 106 107 -1 -1 -1 108 -1 -1 -1 -1 109 110 111 -1 -1 -1 112 113 114 115 -1 -1 116 -1 117 -1 118 -1 119 120 121 122 123 -1 -1 124 -1 125 126 127 -1 128 129 -1 130]

so it looks like the "h, err := p.objectHeaderAtOffset(1836404)" in fsobject.go returns an invalid header which has a default Reference Hash zero.

what interesting is, "h, err := s.nextObjectHeader()" in "SeekObjectHeader" function from scanner.go return 000000000000000000000 for all the files.

so I print all the objectHeader and find the function in trouble is func (s *Scanner) nextObjectHeader() (*ObjectHeader, error)

we did not set h.Reference for objectHeader except for REFDeltaObject. for all other objectHeader types, we leave the h.Reference with nil.

Of course it's not the fault of this commit, but this commit definitely rely on the plumbing.Hash check for objectHeader.

@radeksimko
Copy link

FWIW we also have a test reproducing what appears to be the same problem (or at least surfaced by the same error) which was passing on 5.3.0, but no longer passing with 5.4.0+:

https://app.circleci.com/pipelines/github/hashicorp/terraform-exec/955/workflows/e5b29188-6559-4205-940b-ad359aa645d0/jobs/12424?invite=true#step-102-4006

@zeripath
Copy link
Contributor

Hi peeps. Sorry this seems to have broken things.

The problem lies in

func (p *Packfile) readOFSDeltaObjectContent(h *ObjectHeader, deltaRC io.ReadCloser) (io.ReadCloser, error) {
hash, err := p.FindHash(h.OffsetReference)
if err != nil {
return nil, err
}
base, err := p.objectAtOffset(h.OffsetReference, hash)
if err != nil {
return nil, err
}
base, ok := p.cacheGet(h.Reference)
if !ok {
base, err = p.Get(h.Reference)
if err != nil {
return nil, err
}
}
return ReaderFromDelta(h, base, deltaRC)
}

For a start it looks like I made a mistake in L424-L435.

That should be:

func (p *Packfile) readOFSDeltaObjectContent(h *ObjectHeader, deltaRC io.ReadCloser) (io.ReadCloser, error) {
	hash, err := p.FindHash(h.OffsetReference)
	if err != nil {
		return nil, err
	}

	base, err := p.objectAtOffset(h.OffsetReference, hash)
	if err != nil {
		return nil, err
	}

	return ReaderFromDelta(h, base, deltaRC)
}

but then it looks like the ReaderFromDelta function isn't applying correctly.

I'll keep on looking.

Apologies - I didn't manage to get a test case to hit this.

@radeksimko
Copy link

I didn't manage to get a test case to hit this.

Here is a simple test case pulled out of our code base:

package main

import (
	"testing"

	"github.com/go-git/go-git/v5"
)

func TestRepro(t *testing.T) {
	_, err := git.PlainClone(t.TempDir(), false, &git.CloneOptions{
		URL:           "https://github.com/hashicorp/terraform",
		ReferenceName: "main",

		Depth: 1,
		Tags:  git.NoTags,
	})
	if err != nil {
		t.Fatal(err)
	}
}
$ go test ./...
--- FAIL: TestRepro (11.09s)
    main_test.go:18: reference not found

I guess you may be looking for one that doesn't involve cloning a big repository, i.e. one that can be just added to this codebase, but I thought I'd mention it anyway.

@zeripath
Copy link
Contributor

OK so I think the issue is the scanner is seeking away from the delta - I'll put up a PR that just reads the whole OFS delta and REF delta into memory, and fixes the other bug I've noted above.

There will be potentially be some memory issue here whereby the delta could be too large so this will need further review.

zeripath added a commit to zeripath/go-git that referenced this issue May 27, 2021
Unfortunately the code in go-git#303 was incorrect for large objects in OFS deltas.

The underlying fault is that the seeker in the packfile was seeking away whilst the
delta was being read.

This PR simply reads the delta into a buffer.

Fix go-git#323

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor

zeripath commented May 27, 2021

Sorry about this once again.

@zeripath
Copy link
Contributor

zeripath commented Jun 2, 2021

I think we can close this now #329 is merged

@radeksimko
Copy link

I can confirm that the same test that was previously failing in our codebase with 5.4.1 is passing with 5.4.2 so it seems resolved. 👍🏻

zeripath added a commit to zeripath/go-git that referenced this issue Jun 2, 2021
This PR adds code to prevent large objects from being read into memory
from packfiles or the filesystem.

Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.

This PR differs and improves the previous broken go-git#323 by fixing several
bugs in the reader and transparently wrapping ReaderAt as a Reader.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@mgoltzsche
Copy link

yes, it works nicely for me now as well - thanks!

mcuadros pushed a commit that referenced this issue Jun 30, 2021
… memory completely (#330)

This PR adds code to prevent large objects from being read into memory
from packfiles or the filesystem.

Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.

This PR differs and improves the previous broken #323 by fixing several
bugs in the reader and transparently wrapping ReaderAt as a Reader.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor

zeripath commented Aug 6, 2021

I think this can be closed now?

harry-hov pushed a commit to gitopia/go-git that referenced this issue Oct 6, 2022
… memory completely (go-git#330)

This PR adds code to prevent large objects from being read into memory
from packfiles or the filesystem.

Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.

This PR differs and improves the previous broken go-git#323 by fixing several
bugs in the reader and transparently wrapping ReaderAt as a Reader.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@AriehSchneier
Copy link
Contributor

@pjbgf this looks like it can be closed now.

@pjbgf
Copy link
Member

pjbgf commented May 29, 2023

Thanks everybody that confirmed this is now working. @AriehSchneier thanks for the heads up.

@pjbgf pjbgf closed this as completed May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants