draft(storage/transfermanager): prototype #10045

BrennaEpp · 2024-04-25T17:13:38Z

No description provided.

sketch of an interface for downloads

tritone

A few initial comments, overall looks like a good start

tritone · 2024-04-25T20:10:20Z

storage/transfermanager/downloader.go

+type Downloader struct {
+	client  *storage.Client
+	config  *transferManagerConfig
+	work    chan *DownloadObjectInput // Piece of work to be executed.


Presumably this should be send-only and output should be receive?

We are sending and receiving from both channels in different places in the downloader. Unidirectional channels could be used in subcomponents or if we were providing the channel to the user, but I don't see how we could implement this with unidirectional channels - if we only received from output, who would send us the output (and vice-versa for work)?

tritone · 2024-04-25T20:11:05Z

storage/transfermanager/downloader.go

+	"google.golang.org/api/iterator"
+)
+
+// Downloader manages parallel download operations from a Cloud Storage bucket.


Technically bucket can be specified per object. Let's just say that it manages a set of parallelized downloads.

Makes sense, that wording is clearer.

tritone · 2024-04-25T20:11:40Z

storage/transfermanager/downloader.go

+}
+
+// DownloadObject queues the download of a single object. If it's larger than
+// the specified part size, the download will automatically be broken up into


We can leave off the part about sharding for the initial PR and add when we actually do that.

tritone · 2024-04-25T20:13:14Z

storage/transfermanager/downloader.go

+// This will initiate the download but is non-blocking; wait on Downloader.Results
+// to process the result. Results will be split into individual objects.
+// NOTE: Do not use, DownloadDirectory is not implemented.
+func (d *Downloader) DownloadDirectory(ctx context.Context, input *DownloadDirectoryInput) {


Can leave this out of the PR for now.

tritone · 2024-04-25T20:14:39Z

storage/transfermanager/downloader.go

+// Choice of transport, etc is configured on the client that's passed in.
+func NewDownloader(c *storage.Client, opts ...TransferManagerOption) (*Downloader, error) {
+	const (
+		chanBufferSize = 1000 // how big is it reasonable to make this?


I think this should be unbuffered and we should just buffer in a slice probably. Does that make sense to you?

Not sure... does that mean that we'd have something in the background checking the slice and sending the work through?

Yeah that sounds right. Background routine listens on results channel and writes to a slice (which functions as a queue but with no risk of blocking from hitting max length). Might make Next a little trickier though.

Hmm, so I'm not sure if I follow your solution and how Next plays into it.

Rather than having a slice, I came up with a solution that uses errgroup. Let's take it offline to discuss pros and cons.

tritone · 2024-04-25T20:20:23Z

storage/transfermanager/downloader.go

+	return
+}
+
+type DownloadDirectoryInput struct {


Remove for initial PR.

tritone · 2024-04-25T20:22:18Z

storage/transfermanager/downloader.go

+
+// Waits for all outstanding downloads to complete. The Downloader must not be
+// used to download more objects or directories after this has been called.
+func (d *Downloader) WaitAndClose() error {


IMO this should return an error if any individual download returned an error. So, we'll have to collect those.

Are you thinking it will return an error collecting all errors that occurred, or just something that indicated that there was a failure somewhere?

I think we can do the latter to start, but eventually I guess It would be nice to offer one large multierror as well

Agreed, I'll mark it as a to-do for when we can use errors.Join. I think it makes sense to wait until then to avoid having to continue to support a custom multierror rather than the native one.

tritone · 2024-04-25T20:23:57Z

storage/transfermanager/integration_test.go

+	return crc32cHash.Sum32(), w.Close()
+}
+
+type testWriter struct {


doesn't have to be in this PR, but we should add a version of this (well, some kind of DownloaderBuffer that implements WriterAt) to the library.

Yeah, I'd have to look more into that. This is a very barebones implementation that is likely not at all efficient (and doesn't really work as a WriterAt yet).

Yeah I think that's fine for now.

tritone · 2024-04-25T20:24:27Z

storage/transfermanager/option.go

+// shards to transfer; that is, if the object is larger than this size, it will
+// be uploaded or downloaded in concurrent pieces.
+// The default is 32 MiB for downloads.
+// NOTE: Sharding is not yet implemented.


Leave off this PR.

tritone · 2024-04-25T20:25:46Z

storage/transfermanager/downloader.go

+	}
+
+	// Start workers in background.
+	for i := 0; i < d.config.numWorkers; i++ {


Presumably we could optimize this by spinning up workers as needed when there are objects enqueued? Doesn't have to be in this PR though.

Sure, though I'm not sure how much that would optimize this by... I guess it depends on the num of workers.

Yeah something we can test out later.

tritone and others added 3 commits April 19, 2024 06:46

[WIP] feat(storage/transfermanager): prototype

bee724f

sketch of an interface for downloads

result iterator

189f87b

mvp, just missing some tests

a23db2e

product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Apr 25, 2024

tritone reviewed Apr 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft(storage/transfermanager): prototype #10045

draft(storage/transfermanager): prototype #10045

BrennaEpp commented Apr 25, 2024

tritone left a comment

tritone Apr 25, 2024

BrennaEpp Apr 27, 2024

tritone Apr 25, 2024

BrennaEpp Apr 25, 2024

tritone Apr 25, 2024

tritone Apr 25, 2024

tritone Apr 25, 2024

BrennaEpp Apr 25, 2024

tritone Apr 26, 2024

BrennaEpp Apr 27, 2024

tritone Apr 25, 2024

tritone Apr 25, 2024

BrennaEpp Apr 25, 2024

tritone Apr 26, 2024

BrennaEpp Apr 27, 2024

tritone Apr 25, 2024

BrennaEpp Apr 25, 2024

tritone Apr 26, 2024

tritone Apr 25, 2024

tritone Apr 25, 2024

BrennaEpp Apr 25, 2024

tritone Apr 26, 2024

draft(storage/transfermanager): prototype #10045

Are you sure you want to change the base?

draft(storage/transfermanager): prototype #10045

Conversation

BrennaEpp commented Apr 25, 2024

tritone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment