Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Upload Block Size and Concurrency #40

Open
phillebaba opened this issue Dec 6, 2022 · 1 comment
Open

Azure Upload Block Size and Concurrency #40

phillebaba opened this issue Dec 6, 2022 · 1 comment

Comments

@phillebaba
Copy link
Contributor

phillebaba commented Dec 6, 2022

I have had some time to look deeper into the Azure code as I have spent some time refactoring it. One thing that has bothered me for a while is that hard coded block size and concurrency. I don't really know where these values come from or why those values were chosen. One explanation may be that there were non optimal default values set in the old Azure SDK which now have changed. Either way I do not think these are optimal values.

Currently these values are set to 3Mb and 4 concurrent threads.

opt := azblob.UploadStreamOptions{
BufferSize: 3 * 1024 * 1024,
MaxBuffers: 4,
}

The default value in the current Azure SDK is 1Mb and 1 concurrent thread.
https://github.com/Azure/azure-sdk-for-go/blob/c8c9838f7dc383a0bc2ad7b6cc09d51eb619d8f6/sdk/storage/azblob/blockblob/models.go#L256-L261

To understand why block size matters it is useful to understand how Thanos uses Azure Storage Accounts. Thanos uses Block Blobs when storing metrics in an Azure Storage Account.

Block blobs are optimized for uploading large amounts of data efficiently. Block blobs are composed of blocks, each of which is identified by a block ID. A block blob can include up to 50,000 blocks. Each block in a block blob can be a different size, up to the maximum size permitted for the service version in use.

https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs

This means that all files that are uploaded by Thanos are split into blocks which are the size of the set block size. When all of the blocks have been uploaded they are committed so that the blob becomes available. During the upload process the block size matters because the maximum memory used is defined by block size * concurrency. This is useful when limiting the memory used by the receiver during upload. Microsoft seems to have a plan on simplifying this by automatically calculating an optimal value based on maximum memory usage. It is not clear when or if this will be implemented.
https://github.com/Azure/azure-sdk-for-go/blob/c8c9838f7dc383a0bc2ad7b6cc09d51eb619d8f6/sdk/messaging/azeventhubs/internal/blob/chunkwriting.go#L32-L39

Things get a bit more interesting when you start thinking about how the blocks are stored and fetched during read. Sadly there is very little documentation by Microsoft about what effect this has. There are however some blog posts and Stackoverflow answers that claim it has an affect but with no data to back up the claim. There is an issue MicrosoftDocs/azure-docs#100416 which is requesting more information, but these things can take a while to get a response on.

If there is any correlation between block size and download duration it should be easy to prove, just upload the same amount of data multiple times with different block sizes and measure the download duration. So that is what I have done. I have written phillebaba/azblob-benchmark to test this. It uploads and downloads files multiple times for different block sizes and takes an average duration value for each block size. One assumption I have made, which should be corrected if wrong, is that most of the files that Thanos downloads are 512Mb. This assumption is taken from the fact that the chunk files are all capped at 512 Mb. I measured both upload and download speed, concurrency does not matter for download duration, which is why it is set to an arbitrary value. The tests were run from a Standard D2s v5 (2 vcpus, 8 GiB memory) in the same region as the Storage Account.

image

The following result shows that there is a correlation between block size and download duration. Diminishing returns seems to eventually kick in as the block size gets bigger. At first I feared that the result I was seeing was somehow linked to slow startup or some sort of caching which eventually kicks in. So I ran the same test in reverse order of block size, decreasing the block size this time. And got very much of the same results.

image

My conclusion from these results is that the current block size is not optimal for reads. I think there is some more research that needs to be done before a proper conclusion can be made but I think it is safe to say that increasing the block size from the current 3 Mb to somewhere around 8 Mb would have a positive impact on download speed. In the end we should find a good middle ground between fast read speeds and memory usage during upload.

@phillebaba
Copy link
Contributor Author

Here is another good doc to read about upload performance. This explains the drop in upload duration that occurs when the block size is larger that 4 Mb.
https://azure.microsoft.com/en-us/blog/high-throughput-with-azure-blob-storage/

Basically the block size has to be larger than 4 Mb to enable high throughput when uploading the block to Azure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant