feat(chunked): use chunked upload limited to 32M buffer #1869

rchincha · 2023-02-27T23:15:20Z

No description provided.

vrothberg

giuseppe · 2023-02-28T11:58:56Z

what is the issue you are trying to fix?

mtrmac

Thanks!

Most importantly, please carefully read https://github.com/containers/image/blob/main/CONTRIBUTING.md#sign-your-prs ; we can’t accept submissions with unclear copyright status.

I echo Giuseppe’s question — what is the goal of this work? AFAICS if we can upload with a single request, that’s more efficient than splitting; so why split, and why split at arbitrary boundaries (or even let the user configure that)?

I can imagine it would be beneficial to be able to retry parts of uploads — but pre-splitting chunks like this seems not to be essential to that, and the major difficulty for retries is that we can’t rewind the input stream at all.

mtrmac · 2023-02-28T14:49:32Z

docker/docker_image_dest.go

-		if err != nil {
-			logrus.Debugf("Error uploading layer chunked %v", err)
-			return nil, err
+		for streamedLen := int64(0); streamedLen < inputInfo.Size; {


inputInfo.Size can be -1 meaning “unknown” (and this is actually the most common case, for podman push). So this wouldn’t work.

mtrmac · 2023-02-28T14:52:15Z

docker/docker_image_dest.go

@@ -207,7 +219,8 @@ func (d *dockerImageDestination) PutBlobWithOptions(ctx context.Context, stream
 	locationQuery := uploadLocation.Query()
 	locationQuery.Set("digest", blobDigest.String())
 	uploadLocation.RawQuery = locationQuery.Encode()
-	res, err = d.c.makeRequestToResolvedURL(ctx, http.MethodPut, uploadLocation, map[string][]string{"Content-Type": {"application/octet-stream"}}, nil, -1, v2Auth, nil)
+	rangeHdr := fmt.Sprintf("%d-%d", inputInfo.Size, inputInfo.Size)


This is not really a correct Content-Range value for this operation, ranges are inclusive.

rchincha · 2023-03-01T18:16:30Z

what is the issue you are trying to fix?

We have a situation where our registry is behind a proxy which enforces Content-Length maximums. Since dist-spec allows this chunked mode, exploring this option. Thoughts on alternatives?

mtrmac · 2023-03-01T18:32:49Z

Have the owner of the proxy reconfigure it so that it doesn’t break legitimate traffic? I know, that’s probably unhelpful, but I do think it needs to be said and explored as the primary approach.

If a proxy imposes a limit on Content-Length, I can’t see how that could be possibly reasonably discoverable by a registry client; and the registry protocol doesn’t allow the registry to accept a part of the upload chunk and to tell the client to start a new request.

Just looking at the registry API, https://github.com/distribution/distribution/blob/main/docs/spec/api.md#upload-progress shows that the client could, possibly, after a failure, figure out how much was uploaded, and to restart from the right point. But

for that to work, the client must be allowed to submit the large request in the first place; this could be a failure recovery strategy, not a proactive request splitting mechanism
it’s still the case that PutBlob in c/image is, in general, not restartable at all. We could, plausibly, add an opt-in mechanism that would allow restarting in some situations (if both the source and destination transports opt in), but that wouldn’t help for the typical podman push…

rchincha · 2023-03-01T18:40:04Z

If a proxy imposes a limit on Content-Length, I can’t see how that could be possibly reasonably discoverable by a registry client;

In the general case, yes it is not. But for tools such as skopeo (which links against this library), maybe expose a --max-chunk-size cmdline param to be flexible and mitigate.

mtrmac · 2023-03-01T18:59:38Z

Yes… and the trade-off is allowing one unhelpful proxy in the world, vs. everyone else having to read about an option they should never use. (I suppose hiding that option is a possibility, but that’s still a cost.) It’s possible but not very attractive…

OTOH categorically rejecting this workaround could be an argument for the proxy owner to adjust it 👿

Anyway, without clear copyright status this is all moot.

TomSweeneyRedHat · 2023-03-01T20:58:10Z

@rchincha thanks for the PR. You neglected to sign your commit, After you make adjustments based on the conversation, please do git commit --amend -s, then fetch, rebase, and push the PR again. That should clear the DCO issue you're running into.

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>

jrobertson-insite · 2023-03-24T17:21:14Z

For what it's worth this is exactly the situation I find myself in with a registry behind Cloudflare proxying. Standard docker tooling for pushing the images works fine. But when I try to use Skopeo I get a 413 error (shown below). And this is far from uncommon to put most/all of a company's infrastructure behind Cloudflare or similar. Yes, you can adjust the max request size. Or just not proxy through Cloudflare but then you have to deal with IT/etc which is often a long, tedious, frustrating process. One which I am in the middle of.

FATA[0001] writing blob: uploading layer chunked: StatusCode: 413

mtrmac · 2023-03-24T20:11:47Z

Standard docker tooling for pushing the images works fine. But when I try to use Skopeo I get a 413 error

Thanks, that’s relevant: if there really is an implementation difference, we should see what that is, and perhaps adjust Skopeo so that it just works with no user configuration.

But reading the code, it seems to do exactly the same thing… so my guess is that the proxy is explicitly allowing Docker through but not Skopeo. Am I reading the code incorrectly?

jrobertson-insite · 2023-03-29T13:58:54Z

@mtrmac I'll try to have a look at docker source... I've never dug in to it so it might take a bit. But I can use standard docker tooling and 100% it is not configured as a special exception in the Cloudflare config. (I will double check to be certain, but I'm pretty sure)

mtrmac · 2023-03-29T22:17:12Z

FWIW the code I was looking at is https://github.com/moby/moby/blob/77053da5c625bb75cd3505bdb9dbe323e57c587a/distribution/push_v2.go#L492 and https://github.com/distribution/distribution/blob/b5ca020cfbe998e5af3457fda087444cf5116496/registry/client/blob_writer.go#L38 .

haampie · 2023-07-26T16:08:57Z

Can someone clarify how uploading works at all now (without this PR)? My experience is that uploading files w/o chunks sometimes breaks the server/registry (it accepts at most say 1MB or so).

Is it implicitly using HTTP Transfer-Encoding or so? Or is it really sending a multi-GB layer in one go?

Edit: discovered the following:

When uploading to ghcr.io using those POST + multiple PATCH + PUT requests, setting Content-Range causes the server to return a 416 status with a message (in json) saying:

{"errors":[{"code":"REQUESTED_RANGE_NOT_SATISFIABLE","message":"the request body is too large and exceeds the maximum permissible limit of 4.00MiB"}]}

At the same time it's totally fine to upload a multi-GB file in a single PATCH request, without Content-Range 🤦‍♂️

mtrmac · 2023-07-26T18:35:05Z

As far as I can remember, this is the first report we’ve received that large uploads are a problem. The single PATCH seems to work well enough for presumably quite a few users of podman push / skopeo and the like (or maybe people for which it doesn’t work have all given up and not reported back, I can’t rule that out).

It wouldn’t surprise me if almost all writes were to a co-located registry server over a reliable link; at least in principle I can certainly imagine a link so unreliable that uploading a 1-GB layer just wouldn’t be possible with this implementation. But so far this issue talks about ~arbitrary server/proxy policies/limits, not about physical limits.

haampie · 2023-07-26T19:24:31Z

From my limited testing:

Setting Content-Range makes ghcr.io error on chunks larger than 4MB
There's no standard that tells you this max size
Uploading chunks of 1MB or so to be safe is a major performance degradation (many small, sequential requests)

I doubt there's a technical reason ghcr cannot handle larger chunks, it's probably that nobody does this and they didn't test it.

Not having a way to know the max chunk size is probably also an oversight in the OCI spec, and makes adoption a pain.

I'll stick with monolithic uploads...

mtrmac · 2023-07-26T19:28:52Z

One possible technical reason that processing the upload requires computing a SHA256 hash of the finished artifact. That’s cheaper to do when processing a single upload request than when supporting arbitrary (overlapping!) range uploads.

Pragmatically, though, none of the major client implementations ever has, so far, used ranged requests, so registries had no reason to support them — and that, again, means that clients are disincentivized to start using them, as this research shows.

haampie · 2023-07-26T20:23:51Z

I think the OCI spec says you can't have overlap, and that chunks must be uploaded sequentially... seems objectively inferior.

haampie · 2023-07-26T20:43:18Z

docker/docker_image_dest.go

@@ -178,14 +178,26 @@ func (d *dockerImageDestination) PutBlobWithOptions(ctx context.Context, stream
 	stream = io.TeeReader(stream, sizeCounter)

 	uploadLocation, err = func() (*url.URL, error) { // A scope for defer
+		maxChunkSize := int64(32 * 1024 * 1024) // FIXME: if not specified at cmdline, use the whole length


If you really want to implement this, you should definitely make it opt-in. As I explained in the thread, some registries (such as ghcr.io) have rather small upload size limits, only when doing chunked uploads. Monolithic uploads don't have those limits.

mtrmac · 2023-07-27T13:20:39Z

To be clear, so far I see ~zero reason to add this feature. We have

two reports of problematic proxies (one of which says that Docker, which does, AFAICT, exactly the same thing, is allowed by the proxy); where pushing back might be better for the ecosystem
an implementation design where actually supporting chunks / restarts / resumes / reuploads is notably more work that this PR.
no known way to autodetect the necessary settings.

I appreciate the discussion is ongoing, so I don’t want to cut it in the middle and to prevent more data / research from being discovered … but personally for me it’s a close call; I’m skeptical that the situation will drastically change.

vrothberg reviewed Feb 28, 2023

View reviewed changes

mtrmac requested changes Feb 28, 2023

View reviewed changes

feat(chunked): use chunked upload limited to 32M buffer

48c62e4

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>

rchincha force-pushed the tag-v5.24.2 branch from 0f6dac9 to 48c62e4 Compare March 1, 2023 20:59

haampie reviewed Jul 26, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chunked): use chunked upload limited to 32M buffer #1869

feat(chunked): use chunked upload limited to 32M buffer #1869

rchincha commented Feb 27, 2023

vrothberg left a comment

giuseppe commented Feb 28, 2023

mtrmac left a comment

mtrmac Feb 28, 2023

mtrmac Feb 28, 2023

rchincha commented Mar 1, 2023

mtrmac commented Mar 1, 2023

rchincha commented Mar 1, 2023 •

edited

mtrmac commented Mar 1, 2023 •

edited

TomSweeneyRedHat commented Mar 1, 2023

jrobertson-insite commented Mar 24, 2023

mtrmac commented Mar 24, 2023

jrobertson-insite commented Mar 29, 2023

mtrmac commented Mar 29, 2023

haampie commented Jul 26, 2023 •

edited

mtrmac commented Jul 26, 2023

haampie commented Jul 26, 2023 •

edited

mtrmac commented Jul 26, 2023

haampie commented Jul 26, 2023 •

edited

haampie Jul 26, 2023 •

edited

mtrmac commented Jul 27, 2023

feat(chunked): use chunked upload limited to 32M buffer #1869

Are you sure you want to change the base?

feat(chunked): use chunked upload limited to 32M buffer #1869

Conversation

rchincha commented Feb 27, 2023

vrothberg left a comment

Choose a reason for hiding this comment

giuseppe commented Feb 28, 2023

mtrmac left a comment

Choose a reason for hiding this comment

mtrmac Feb 28, 2023

Choose a reason for hiding this comment

mtrmac Feb 28, 2023

Choose a reason for hiding this comment

rchincha commented Mar 1, 2023

mtrmac commented Mar 1, 2023

rchincha commented Mar 1, 2023 • edited

mtrmac commented Mar 1, 2023 • edited

TomSweeneyRedHat commented Mar 1, 2023

jrobertson-insite commented Mar 24, 2023

mtrmac commented Mar 24, 2023

jrobertson-insite commented Mar 29, 2023

mtrmac commented Mar 29, 2023

haampie commented Jul 26, 2023 • edited

mtrmac commented Jul 26, 2023

haampie commented Jul 26, 2023 • edited

mtrmac commented Jul 26, 2023

haampie commented Jul 26, 2023 • edited

haampie Jul 26, 2023 • edited

Choose a reason for hiding this comment

mtrmac commented Jul 27, 2023

rchincha commented Mar 1, 2023 •

edited

mtrmac commented Mar 1, 2023 •

edited

haampie commented Jul 26, 2023 •

edited

haampie commented Jul 26, 2023 •

edited

haampie commented Jul 26, 2023 •

edited

haampie Jul 26, 2023 •

edited