CosmosDB: Bulk support (or guidance on throughput improvement via batching) #22504

zarenner · 2024-03-01T01:43:24Z

Feature Request

Bulk support doesn't exist today in the Go SDK. My understanding is that it would be complex to implement and is unlikely to be done anytime soon if at all. Still, I'm curious if the Transactional Batch support added in #17795 allows for an application developer to get at least some of the throughput benefits of bulk or if there are fundamental differences between them.

Based on this, this, and the .NET implementation, my understanding is that Bulk is essentially TransactionalBatch with the enhancements that it:

Operates on physical partitions rather than on logical keys by automatically mapping keys to their ranges/partitions (including detecting and handling partition splits)
Automatically dispatches filled batches when they hit size/count limits
Automatically dispatches partially filled batches on a timer
Automatically handles congestion / retry (especially around same-partition writes)
Sets x-ms-cosmos-batch-atomic to false, x-ms-cosmos-batch-continue-on-error to true
Others I'm sure I've missed 😄

Some of these are relatively easy for an application to implement, but (1) isn't possible because the Go SDK lacks any concept of physical partitions / ranges today, and TransactionalBatch (like all operations) accepts only a single partitionKey.

So I guess my question is - is NewTransactionalBatch(partitionKey) at all useful for bulk operations even without knowledge of physical partitions, assuming you can fill batches per partitionKey? With sequential requests it still seems useful to batch them, but I imagine as soon as you try to parallelize the batches you're going to hit severe throttling even if you tried to implement some sort of congestion control at the partitionKey level (edit: although perhaps x-ms-documentdb-partitionkeyrangeid response header could help).

More generally, what are the recommendations for optimizing throughput given the limitations of the Go SDK?

The text was updated successfully, but these errors were encountered:

ealsur · 2024-03-01T16:16:37Z

On the wire, they are very similar but they have a fundamental difference and that is the atomic header.

With TransactionalBatch, yes, you could create concurrent executions of batches of 100 operations each, where the flow and grouping would be the application responsibility.

The problem is that in Batch, if 1 operation in the batch fails, all fail. In Bulk, each operation can fail independently, because that is the intent. TransactionalBatch is designed to define a transaction scope, all should commit together. Bulk is designed mainly as a client network optimization, so if 1 operation fails and the other succeed, is ok, the operations are not related to each other.

Right now there are no client-side alternatives to optimize (reduce) the volume of requests that are specific to Cosmos DB in Go SDK, you can still execute the operations concurrently, just each one would be its own independent network request. If there are any general Go optimizations that could be done at the general level (on the transport), I'm not knowledgeable enough to recommend.

jhendrixMSFT removed the needs-team-triage This issue needs the team to triage. label Mar 1, 2024

jhendrixMSFT assigned ealsur Mar 1, 2024

github-actions bot added the needs-team-attention This issue needs attention from Azure service team or SDK team label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CosmosDB: Bulk support (or guidance on throughput improvement via batching) #22504

CosmosDB: Bulk support (or guidance on throughput improvement via batching) #22504

zarenner commented Mar 1, 2024 •

edited

ealsur commented Mar 1, 2024 •

edited

CosmosDB: Bulk support (or guidance on throughput improvement via batching) #22504

CosmosDB: Bulk support (or guidance on throughput improvement via batching) #22504

Comments

zarenner commented Mar 1, 2024 • edited

Feature Request

ealsur commented Mar 1, 2024 • edited

zarenner commented Mar 1, 2024 •

edited

ealsur commented Mar 1, 2024 •

edited