Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote.Write unbounded memory use #1123

Closed
gwenaskell opened this issue Sep 17, 2021 · 7 comments
Closed

remote.Write unbounded memory use #1123

gwenaskell opened this issue Sep 17, 2021 · 7 comments

Comments

@gwenaskell
Copy link

gwenaskell commented Sep 17, 2021

Hello,

First of all, thanks to the contributors for this well designed library.

We are designing a containerized server which imports images tarballs and pushes them to a registry. To make it reasonably scalable we'd need to prevent these uploads from exceeding 20-30 Mb memory each. I noticed though that the remote.Write process consumes quite a lot of memory and -which is more problematic- this amount of memory is not bounded and tends to increase with the size of the image, even when setting remote.WithJobs(1) option.

Here is a little program I used to test this:

package main

import (
	"crypto/tls"
	"fmt"
	"net/http"

	"github.com/docker/docker/client"
	"github.com/google/go-containerregistry/pkg/authn"
	"github.com/google/go-containerregistry/pkg/name"
	v1 "github.com/google/go-containerregistry/pkg/v1"
	"github.com/google/go-containerregistry/pkg/v1/remote"
	"github.com/google/go-containerregistry/pkg/v1/tarball"
)


func main() {
	tag, err := name.NewTag("172.18.8.101:30301/test:0.0.0")
	if err != nil {
		panic(err)
	}

	img, err := tarball.ImageFromPath("test.tar", nil)
	if err != nil {
		panic(err)
	}

	trsp := http.DefaultTransport.(*http.Transport).Clone()

	conf := new(tls.Config)
	conf.InsecureSkipVerify = true

	trsp.TLSClientConfig = conf

	updates := make(chan v1.Update, 1)

	go func() {
		for {
			upd := <-updates
			if upd.Error != nil {
				fmt.Println(upd.Error)
			}
			fmt.Printf("%d / %d\n", upd.Complete, upd.Total)
			if upd.Total == 0 {
				return
			}
		}
	}()
	err = remote.Write(tag, img, remote.WithAuth(
		&authn.Basic{
			Username: "admin",
			Password: "admin",
		}), remote.WithJobs(1), remote.WithTransport(trsp), remote.WithProgress(updates))

	if err != nil {
		panic(err)
	}
}

Running this with a test.tar file of ~ 300 Mb consumes about 100Mb of memory as visible on the following memory profile :

(I truncated the top of the file)

image

image

Seeing this graph, it seems the issue has to do with the standard package compress/flate, right? I've seen this issue that might be related: golang/go#32371

FYI, I experimented a docker push using the official client within a dind container to compare, (although the process is different, since we push a daemon image and not a tarball but apart from that it should be the same I guess) and observed ~ 25 Mb memory increase while pushing a 1 Gb image. [EDIT the graph above is showing alloc_space i.e. total memory allocated and not instant memory peak so that's quite different]

Is there any approach or setting that would avoid dealing with compression or reduce the compression level, if that might help? Or do we need to rewrite the library using another compression algorithm?

thanks for the help!

@gwenaskell gwenaskell changed the title remote.Write heavy memory use remote.Write unbounded memory use Sep 17, 2021
@jonjohnsonjr
Copy link
Collaborator

I believe you're running into #413

We're essentially compressing everything twice because we compute the digest/size on the first pass, then recompress when we're actually uploading it. This is sub-optimal, which is why stream.Layer exists (which is similar to docker's behavior).

The easiest thing to do might be just saving the image tarballs differently? Do you have control of these image tarballs? How are they being produced? If the layers are compressed in the tarball, we can avoid compressing them at all! I ended up doing something similar for this bazelbuild/rules_docker#1696

A slightly harder thing to do would be refactoring the tarball.Image code a bit to use stream.Layer for uncompressed layers. I'd welcome a PR, but it will be kind of a nightmare to untangle.

@gwenaskell
Copy link
Author

gwenaskell commented Sep 20, 2021

Interesting, thank you. So I could copy each layer from the tarball to a stream.Layer using Uncompressed() and write them to the registry instead? Is there an example to reproduce somewhere? I will use the content of remote.Write otherwise

We typically use a docker save command to create these tarballs, which does not allow us to compress the layers. The first solution looks preferable to me!

@gwenaskell
Copy link
Author

gwenaskell commented Sep 20, 2021

OK I got you wrong I guess, using a stream.Layer backed by a tarball layer implementation does not change the problem. I understand that I still have to rewrite the tarball package to replace the layer implementation, or find a way to create compressed tarballs.

@gwenaskell
Copy link
Author

gwenaskell commented Sep 20, 2021

Actually since pushing stream.Layers still uses compression at least once we would only partially fix the issue. Given the profile above, it seems that gzip.Write calls have declared around 80 flate.Writers in total (40MB and a zero flate.Writer is around 500kb), which means 8 fresh writers per layer (the image is 10 layers big), so that seems quite a lot. A solution could also consist in replacing standard compression packages with https://github.com/klauspost/compress which seems more efficient (I did not test it).

For now I will look after a cli tool to create compressed image tarballs.

@gwenaskell
Copy link
Author

gwenaskell commented Sep 20, 2021

I noticed I've been fooled by my profile not being updated on my latest tests and I cannot reproduce the graph above since then, the total memory only reaches 15MB. So my previous comment might be misleading.

@jonjohnsonjr
Copy link
Collaborator

For now I will look after a cli tool to create compressed image tarballs.

If you're going this far, I would look at using a different on-disk representation. While compressed layers in a docker save style tarball saves you from having to gzip twice, you'll still end up calculating the sha256 twice. The layout format saves things on the disk in more or less the same format they'll be pushed to the registry, so it's a lot easier for us to be efficient.

I'd read via daemon and write via layout, then have your tool read from layout and write with remote.

I realize that this just shifts the problem left a little bit, but that might be preferable?

@gwenaskell
Copy link
Author

Hello !

sorry for the late answer, I followed your proposal and designed a CLI to export daemon images to layout archives, which can then be handled by our server without dealing with compression, thus making the process much faster.

For anyone needing it, here is the function used to perform the export:

func Save(imgTag, outDir string) error {
	tag, err := name.NewTag(imgTag)
	if err != nil {
		return err
	}

	fmt.Printf("[oci-utils] Importing image %s from the docker daemon\n", imgTag)

	img, err := daemon.Image(tag)
	if err != nil {
		return err
	}

	layoutDir := os.TempDir() + "/layout-" +
		strings.ReplaceAll(tag.RepositoryStr(), "-", "") +
		"-" + strings.ReplaceAll(tag.TagStr(), "-", "")

	if dir, set := os.LookupEnv("LAYOUT_TMP_DIR"); set {
		layoutDir = dir
	}
	layoutDir, err = filepath.Abs(layoutDir)
	if err != nil {
		return err
	}

	path, err := layout.Write(layoutDir, empty.Index)
	if err != nil {
		return err
	}
	defer os.RemoveAll(layoutDir)

	err = path.AppendImage(img)
	if err != nil {
		return err
	}

	tarballPath := filepath.Join(outDir, "layout.tar")

	fmt.Println("[oci-utils] Image will be saved to " + tarballPath)

	target, err := os.Create(tarballPath)
	if err != nil {
		return err
	}
	defer target.Close()

	sink := tar.NewWriter(target)

	err = filepath.Walk(layoutDir,
		func(path string, info os.FileInfo, err error) error {
			if err != nil {
				return err
			}
			header, err := tar.FileInfoHeader(info, info.Name())
			if err != nil {
				return err
			}

			header.Name = strings.TrimPrefix(path, layoutDir)

			if err := sink.WriteHeader(header); err != nil {
				return err
			}

			if info.IsDir() {
				return nil
			}

			file, err := os.Open(path)
			if err != nil {
				return err
			}
			defer file.Close()
			_, err = io.Copy(sink, file)
			return err
		})
	if err != nil {
		return err
	}

	fmt.Printf("[oci-utils] Saved %s to %s\n", imgTag, tarballPath)
	return nil
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants