Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chunked): use chunked upload limited to 32M buffer #1869

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rchincha
Copy link

No description provided.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe
Copy link
Member

what is the issue you are trying to fix?

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Most importantly, please carefully read https://github.com/containers/image/blob/main/CONTRIBUTING.md#sign-your-prs ; we can’t accept submissions with unclear copyright status.


I echo Giuseppe’s question — what is the goal of this work? AFAICS if we can upload with a single request, that’s more efficient than splitting; so why split, and why split at arbitrary boundaries (or even let the user configure that)?

I can imagine it would be beneficial to be able to retry parts of uploads — but pre-splitting chunks like this seems not to be essential to that, and the major difficulty for retries is that we can’t rewind the input stream at all.

if err != nil {
logrus.Debugf("Error uploading layer chunked %v", err)
return nil, err
for streamedLen := int64(0); streamedLen < inputInfo.Size; {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputInfo.Size can be -1 meaning “unknown” (and this is actually the most common case, for podman push). So this wouldn’t work.

@@ -207,7 +219,8 @@ func (d *dockerImageDestination) PutBlobWithOptions(ctx context.Context, stream
locationQuery := uploadLocation.Query()
locationQuery.Set("digest", blobDigest.String())
uploadLocation.RawQuery = locationQuery.Encode()
res, err = d.c.makeRequestToResolvedURL(ctx, http.MethodPut, uploadLocation, map[string][]string{"Content-Type": {"application/octet-stream"}}, nil, -1, v2Auth, nil)
rangeHdr := fmt.Sprintf("%d-%d", inputInfo.Size, inputInfo.Size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a correct Content-Range value for this operation, ranges are inclusive.

@rchincha
Copy link
Author

rchincha commented Mar 1, 2023

what is the issue you are trying to fix?

We have a situation where our registry is behind a proxy which enforces Content-Length maximums. Since dist-spec allows this chunked mode, exploring this option. Thoughts on alternatives?

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 1, 2023

Have the owner of the proxy reconfigure it so that it doesn’t break legitimate traffic? I know, that’s probably unhelpful, but I do think it needs to be said and explored as the primary approach.


If a proxy imposes a limit on Content-Length, I can’t see how that could be possibly reasonably discoverable by a registry client; and the registry protocol doesn’t allow the registry to accept a part of the upload chunk and to tell the client to start a new request.


Just looking at the registry API, https://github.com/distribution/distribution/blob/main/docs/spec/api.md#upload-progress shows that the client could, possibly, after a failure, figure out how much was uploaded, and to restart from the right point. But

  • for that to work, the client must be allowed to submit the large request in the first place; this could be a failure recovery strategy, not a proactive request splitting mechanism
  • it’s still the case that PutBlob in c/image is, in general, not restartable at all. We could, plausibly, add an opt-in mechanism that would allow restarting in some situations (if both the source and destination transports opt in), but that wouldn’t help for the typical podman push

@rchincha
Copy link
Author

rchincha commented Mar 1, 2023

If a proxy imposes a limit on Content-Length, I can’t see how that could be possibly reasonably discoverable by a registry client;

In the general case, yes it is not. But for tools such as skopeo (which links against this library), maybe expose a --max-chunk-size cmdline param to be flexible and mitigate.

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 1, 2023

Yes… and the trade-off is allowing one unhelpful proxy in the world, vs. everyone else having to read about an option they should never use. (I suppose hiding that option is a possibility, but that’s still a cost.) It’s possible but not very attractive…

OTOH categorically rejecting this workaround could be an argument for the proxy owner to adjust it 👿

Anyway, without clear copyright status this is all moot.

@TomSweeneyRedHat
Copy link
Member

@rchincha thanks for the PR. You neglected to sign your commit, After you make adjustments based on the conversation, please do git commit --amend -s, then fetch, rebase, and push the PR again. That should clear the DCO issue you're running into.

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
@jrobertson-insite
Copy link

For what it's worth this is exactly the situation I find myself in with a registry behind Cloudflare proxying. Standard docker tooling for pushing the images works fine. But when I try to use Skopeo I get a 413 error (shown below). And this is far from uncommon to put most/all of a company's infrastructure behind Cloudflare or similar. Yes, you can adjust the max request size. Or just not proxy through Cloudflare but then you have to deal with IT/etc which is often a long, tedious, frustrating process. One which I am in the middle of.

FATA[0001] writing blob: uploading layer chunked: StatusCode: 413

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 24, 2023

Standard docker tooling for pushing the images works fine. But when I try to use Skopeo I get a 413 error

Thanks, that’s relevant: if there really is an implementation difference, we should see what that is, and perhaps adjust Skopeo so that it just works with no user configuration.

But reading the code, it seems to do exactly the same thing… so my guess is that the proxy is explicitly allowing Docker through but not Skopeo. Am I reading the code incorrectly?

@jrobertson-insite
Copy link

@mtrmac I'll try to have a look at docker source... I've never dug in to it so it might take a bit. But I can use standard docker tooling and 100% it is not configured as a special exception in the Cloudflare config. (I will double check to be certain, but I'm pretty sure)

@haampie
Copy link

haampie commented Jul 26, 2023

Can someone clarify how uploading works at all now (without this PR)? My experience is that uploading files w/o chunks sometimes breaks the server/registry (it accepts at most say 1MB or so).

Is it implicitly using HTTP Transfer-Encoding or so? Or is it really sending a multi-GB layer in one go?

Edit: discovered the following:

When uploading to ghcr.io using those POST + multiple PATCH + PUT requests, setting Content-Range causes the server to return a 416 status with a message (in json) saying:

{"errors":[{"code":"REQUESTED_RANGE_NOT_SATISFIABLE","message":"the request body is too large and exceeds the maximum permissible limit of 4.00MiB"}]}

At the same time it's totally fine to upload a multi-GB file in a single PATCH request, without Content-Range 🤦‍♂️

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 26, 2023

As far as I can remember, this is the first report we’ve received that large uploads are a problem. The single PATCH seems to work well enough for presumably quite a few users of podman push / skopeo and the like (or maybe people for which it doesn’t work have all given up and not reported back, I can’t rule that out).

It wouldn’t surprise me if almost all writes were to a co-located registry server over a reliable link; at least in principle I can certainly imagine a link so unreliable that uploading a 1-GB layer just wouldn’t be possible with this implementation. But so far this issue talks about ~arbitrary server/proxy policies/limits, not about physical limits.

@haampie
Copy link

haampie commented Jul 26, 2023

From my limited testing:

  1. Setting Content-Range makes ghcr.io error on chunks larger than 4MB
  2. There's no standard that tells you this max size
  3. Uploading chunks of 1MB or so to be safe is a major performance degradation (many small, sequential requests)

I doubt there's a technical reason ghcr cannot handle larger chunks, it's probably that nobody does this and they didn't test it.

Not having a way to know the max chunk size is probably also an oversight in the OCI spec, and makes adoption a pain.

I'll stick with monolithic uploads...

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 26, 2023

One possible technical reason that processing the upload requires computing a SHA256 hash of the finished artifact. That’s cheaper to do when processing a single upload request than when supporting arbitrary (overlapping!) range uploads.

Pragmatically, though, none of the major client implementations ever has, so far, used ranged requests, so registries had no reason to support them — and that, again, means that clients are disincentivized to start using them, as this research shows.

@haampie
Copy link

haampie commented Jul 26, 2023

I think the OCI spec says you can't have overlap, and that chunks must be uploaded sequentially... seems objectively inferior.

@@ -178,14 +178,26 @@ func (d *dockerImageDestination) PutBlobWithOptions(ctx context.Context, stream
stream = io.TeeReader(stream, sizeCounter)

uploadLocation, err = func() (*url.URL, error) { // A scope for defer
maxChunkSize := int64(32 * 1024 * 1024) // FIXME: if not specified at cmdline, use the whole length
Copy link

@haampie haampie Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to implement this, you should definitely make it opt-in. As I explained in the thread, some registries (such as ghcr.io) have rather small upload size limits, only when doing chunked uploads. Monolithic uploads don't have those limits.

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 27, 2023

To be clear, so far I see ~zero reason to add this feature. We have

  • two reports of problematic proxies (one of which says that Docker, which does, AFAICT, exactly the same thing, is allowed by the proxy); where pushing back might be better for the ecosystem
  • an implementation design where actually supporting chunks / restarts / resumes / reuploads is notably more work that this PR.
  • no known way to autodetect the necessary settings.

I appreciate the discussion is ongoing, so I don’t want to cut it in the middle and to prevent more data / research from being discovered … but personally for me it’s a close call; I’m skeptical that the situation will drastically change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants