Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Blob: Unable to access Metadata when listing blobs #16679

Closed
jkowalski opened this issue Dec 17, 2021 · 16 comments
Closed

Azure Blob: Unable to access Metadata when listing blobs #16679

jkowalski opened this issue Dec 17, 2021 · 16 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files)
Milestone

Comments

@jkowalski
Copy link

Bug Report

  • import path of package in question, e.g. github.com/Azure/azure-sdk-for-go/sdk/storage/azblob

require github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v0.2.0

require (
github.com/Azure/azure-sdk-for-go/sdk/azcore v0.20.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v0.8.1 // indirect
github.com/stretchr/objx v0.2.0 // indirect
)

  • output of go version
    go version go1.17.2 darwin/arm64

  • What happened?

I'm trying to list metadata associated with blobs, but it appears to be incorrectly handled in the SDK code so it always comes back empty.

To reproduce, upload some blobs with custom metadata like so:

	resp, err := bc.Upload(ctx, data.Reader(), &azblob.UploadBlockBlobOptions{
		Metadata: map[string]string{"Kopiamtime": "1577968240000000000"},
	})

Now when listing the blobs there are 2 sub-issues, one blocking the other:

  1. Incorrect handlng of ContainerListBlobFlatSegmentOptions.Include:

My first attempt was to try:

bucket.ListBlobsFlat(&azblob.ContainerListBlobFlatSegmentOptions{
		Prefix: &prefixStr,
		Include: []azblob.ListBlobsIncludeItem{azblob.ListBlobsIncludeItemMetadata},
	})

but the server never returns any metadata. I read through the code and I think there's a bug generating the include query parameter and server silently ignores invalid value (bug in itself?), and to overcome it I have to add artificial square brackets, which makes it work partially - see below:

bucket.ListBlobsFlat(&azblob.ContainerListBlobFlatSegmentOptions{
		Prefix: &prefixStr,
		Include: []azblob.ListBlobsIncludeItem{"["+azblob.ListBlobsIncludeItemMetadata+"]"},
	})
  1. Incorrect mapping of Metadata

After applying the workaround above, I get the following response from the server (there are 4 files in the bucket that match the prefix) - the response was captured using io.Copy(os.Stderr, resp.RawResponse.Body):

<?xml version="1.0" encoding="utf-8"?>
<EnumerationResults ServiceEndpoint="https://kopiatesting.blob.core.windows.net/" ContainerName="kopia-testing">
    <Prefix>sastest-1639703094-ed1304df8b99da34-</Prefix>
    <Blobs>
        <Blob>
            <Name>sastest-1639703094-ed1304df8b99da34-abcdbbf4f0507d054ed5a80a5b65086f602b</Name>
            <Properties>
                <Creation-Time>Fri, 17 Dec 2021 01:04:55 GMT</Creation-Time>
                <Last-Modified>Fri, 17 Dec 2021 01:04:55 GMT</Last-Modified>
                <Etag>0x8D9C0F93CEC42AE</Etag>
                <Content-Length>0</Content-Length>
                <Content-Type>application/octet-stream</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-CRC64 />
                <Content-MD5>1B2M2Y8AsgTpgAmY7PhCfg==</Content-MD5>
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata>
                <Kopiamtime>1577968240000000000</Kopiamtime>
            </Metadata>
            <OrMetadata />
        </Blob>
        <Blob>
            <Name>sastest-1639703094-ed1304df8b99da34-abff4585856ebf0748fd989e1dd623a8963d</Name>
            <Properties>
                <Creation-Time>Fri, 17 Dec 2021 01:04:55 GMT</Creation-Time>
                <Last-Modified>Fri, 17 Dec 2021 01:04:55 GMT</Last-Modified>
                <Etag>0x8D9C0F93D28CE3A</Etag>
                <Content-Length>1000</Content-Length>
                <Content-Type>application/octet-stream</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-CRC64 />
                <Content-MD5>8INdvman6QsODoZHNU0MCw==</Content-MD5>
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata>
                <Kopiamtime>1577968240000000000</Kopiamtime>
            </Metadata>
            <OrMetadata />
        </Blob>
        <Blob>
            <Name>sastest-1639703094-ed1304df8b99da34-abgc3dca496d510f492c858a2df1eb824e62</Name>
            <Properties>
                <Creation-Time>Fri, 17 Dec 2021 01:04:55 GMT</Creation-Time>
                <Last-Modified>Fri, 17 Dec 2021 01:04:55 GMT</Last-Modified>
                <Etag>0x8D9C0F93D3F6062</Etag>
                <Content-Length>10000</Content-Length>
                <Content-Type>application/octet-stream</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-CRC64 />
                <Content-MD5>Gvp7eHd5/MlEV1brkD9viQ==</Content-MD5>
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata>
                <Kopiamtime>1577968240000000000</Kopiamtime>
            </Metadata>
            <OrMetadata />
        </Blob>
        <Blob>
            <Name>sastest-1639703094-ed1304df8b99da34-kopia.repository</Name>
            <Properties>
                <Creation-Time>Fri, 17 Dec 2021 01:04:55 GMT</Creation-Time>
                <Last-Modified>Fri, 17 Dec 2021 01:04:55 GMT</Last-Modified>
                <Etag>0x8D9C0F93D5640A2</Etag>
                <Content-Length>100</Content-Length>
                <Content-Type>application/octet-stream</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-CRC64 />
                <Content-MD5>S3ixhZH4sRD+KIqBPwIHFw==</Content-MD5>
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata>
                <Kopiamtime>1577968240000000000</Kopiamtime>
            </Metadata>
            <OrMetadata />
        </Blob>
        <Blob>
            <Name>sastest-1639703094-ed1304df8b99da34-zxce0e35630770c54668a8cfb4e414c6bf8f</Name>
            <Properties>
                <Creation-Time>Fri, 17 Dec 2021 01:04:55 GMT</Creation-Time>
                <Last-Modified>Fri, 17 Dec 2021 01:04:55 GMT</Last-Modified>
                <Etag>0x8D9C0F93D11C6F2</Etag>
                <Content-Length>1</Content-Length>
                <Content-Type>application/octet-stream</Content-Type>
                <Content-Encoding />
                <Content-Language />
                <Content-CRC64 />
                <Content-MD5>VaVACK0bpYmqIQ0mKcHfQQ==</Content-MD5>
                <Cache-Control />
                <Content-Disposition />
                <BlobType>BlockBlob</BlobType>
                <AccessTier>Hot</AccessTier>
                <AccessTierInferred>true</AccessTierInferred>
                <LeaseStatus>unlocked</LeaseStatus>
                <LeaseState>available</LeaseState>
                <ServerEncrypted>true</ServerEncrypted>
            </Properties>
            <Metadata>
                <Kopiamtime>1577968240000000000</Kopiamtime>
            </Metadata>
            <OrMetadata />
        </Blob>
    </Blobs>
    <NextMarker />
</EnumerationResults>

I was looking to retrieve metadata named Kopiamtime which shows up in the server response, but unfortunately cannot be read in Go code, because it.Metadata is always an empty map:

for _, it := range resp.Segment.BlobItems {
  fmt.Println(it.Metadata.AdditionalProperties)
}           

This always prints map[].

I'm not an expert in Golang XML handling, but I think this mapping is to blame, probably because of extra nesting:

Metadata *BlobMetadata `xml:"Metadata"`

Also notice how the server returns OrMetadata but the code is mapping ObjectReplicationMetadata which is also incorrect.

BTW. Thanks for the library. I'm trying to use it in https://github.com/kopia/kopia - all other things are working fine, except this one issue which is currently blocking.

Is there a workaround that can be applied here? I was thinking of manually parsing the XML for the time being until this issue is resolved.

@ghost ghost added needs-triage This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 17, 2021
@jkowalski
Copy link
Author

jkowalski commented Dec 17, 2021

In case this helps anybody, here's a quick-and-dirty XML parsing workaround that works and allows me to read the one metadata I'm interested in:

		var enumerationResults struct {
			Blobs struct {
				Blob []struct {
					Name       string
					Properties struct {
						ContentLength int64  `xml:"Content-Length"`
						LastModified  string `xml:"Last-Modified"`
					}
					Metadata struct {
						Kopiamtime string
					}
				}
			}
		}

		if err := xml.NewDecoder(resp.RawResponse.Body).Decode(&enumerationResults); err != nil {
			return errors.Wrap(err, "unable to decode response")
		}

@jhendrixMSFT jhendrixMSFT added the Storage Storage Service (Queues, Blobs, Files) label Dec 17, 2021
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Dec 17, 2021
@RickWinter RickWinter added the Client This issue points to a problem in the data-plane of the library. label Dec 17, 2021
@ghost ghost added the needs-team-attention This issue needs attention from Azure service team or SDK team label Dec 17, 2021
@RickWinter RickWinter added bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 17, 2021
@RickWinter RickWinter added this to the [2022] January milestone Dec 17, 2021
@berndverst
Copy link
Contributor

I just came here to report the same issue. For example ContentMD5 property isn't returned from the List Response.

Here is a complete way to reproduce specifically for the ContentMD5 property:

package main

import (
	"context"
	"crypto/md5"
	"encoding/base64"
	"fmt"
	"os"

	"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob"
)

func main() {
	cred, err := azblob.NewSharedKeyCredential(os.Getenv("AzureBlobStorageAccount"), os.Getenv("AzureBlobStorageAccessKey"))
	if err != nil {
		fmt.Printf("failed to create credential %v", err)
	}
	url := fmt.Sprintf("https://%s.blob.core.windows.net/%s", os.Getenv("AzureBlobStorageAccount"), os.Getenv("AzureBlobStorageContainer"))
	container, err := azblob.NewContainerClientWithSharedKey(url, cred, nil)
	if err != nil {
		fmt.Printf("failed to create container client %v", err)
	}
	blockBlob := container.NewBlockBlobClient("randomfile.txt")
	content := []byte("Hello World!")
	md5hash := md5.New()
	md5hash.Write(content)
	b64md5hash := base64.StdEncoding.EncodeToString(md5hash.Sum(nil))

	blockBlob.UploadBufferToBlockBlob(context.Background(), content, azblob.HighLevelUploadToBlockBlobOption{
		HTTPHeaders: &azblob.BlobHTTPHeaders{
			BlobContentMD5: []byte(b64md5hash),
		},
	})
	props, _ := blockBlob.GetProperties(context.Background(), nil)

	receivedB64md5hash := base64.StdEncoding.EncodeToString(props.ContentMD5)
	if b64md5hash != receivedB64md5hash {
		panic("MD5 hash mismatch")
	}
	fmt.Println("MD5 hash from GetProperties response verified")

	pager := container.ListBlobsFlat(nil)

	blobs := []*azblob.BlobItemInternal{}
	for pager.NextPage(context.Background()) {
		response := pager.PageResponse()
		blobs = append(blobs, response.ContainerListBlobFlatSegmentResult.Segment.BlobItems...)
	}

	for _, blob := range blobs {
		if *blob.Name == "randomfile.txt" {
			receivedListb64md5Hash := base64.StdEncoding.EncodeToString(blob.Properties.ContentMD5)

			// blob.Properties.ContentMD5 is nil
			// this seems to be an issue marshalling the response from the server, but in the List request only
			if b64md5hash != receivedListb64md5Hash {
				fmt.Println("MD5 hash from ListBlobsFlat response does not match")
				panic("MD5 hash mismatch")
			}
			break
		}
	}
}

@berndverst
Copy link
Contributor

@RickWinter @jhendrixMSFT FYI I am currently rewriting the Dapr Blob Storage Output Binding component to use the Track 2 SDK and I encountered this issue in my tests.

Until the List response contains all the properties I will not be able to certify this component as Stable as far as Dapr is concerned.

@berndverst
Copy link
Contributor

berndverst commented Dec 18, 2021

Thanks for the tip @jkowalski. I am not sure why the original code doesn't Decode the XML correctly, but the following seems to work for me with regards to the ContentMD5 property

The following code depends on my code snippet from above.

var enumerationResults struct {
		Blobs struct {
			Blob []struct {
				Name       string `xml:"Name"`
				Properties struct {
					ContentMD5 []byte `xml:"Content-MD5"`
				}
			}
		}
	}

	for pager.NextPage(context.Background()) {
		response := pager.PageResponse()

		if err := xml.NewDecoder(response.RawResponse.Body).Decode(&enumerationResults); err != nil {
			panic(err)
		}

		for _, blob := range enumerationResults.Blobs.Blob {
			if blob.Name == "randomfile.txt" {
				receivedListb64md5Hash := blob.Properties.ContentMD5
				if b64md5hash != string(receivedListb64md5Hash) {
					fmt.Println("MD5 hash from ListBlobsFlat response does not match")
					panic("MD5 hash mismatch")
				}
				fmt.Println("MD5 hash from ListBlobsFlat response verified")
				break
			}
		}

	}

@deiter
Copy link

deiter commented Jan 14, 2022

Hello,

I have the same issue. Could you please review my fix:

--- azure-sdk-for-go/sdk/storage/azblob/zz_generated_models.go 2022-01-14 22:44:54.382333817 +0000
+++ azure-sdk-for-go/sdk/storage/azblob/zz_generated_models.go 2022-01-14 22:44:29.330518663 +0000
@@ -512,6 +512,9 @@
        b.DeletedTime = (*time.Time)(aux.DeletedTime)
        b.ExpiresOn = (*time.Time)(aux.ExpiresOn)
        b.LastModified = (*time.Time)(aux.LastModified)
+       if aux.ContentMD5 != nil {
+               b.ContentMD5 = *aux.ContentMD5
+       }
        return nil
 }

and the blob properties is:

Properties &{Etag:0xc000365250 LastModified:2022-01-14 22:23:04 +0000 GMT AccessTier:0xc0003652c0 AccessTierChangeTime:<nil> AccessTierInferred:0xc00021bacd ArchiveStatus:<nil> BlobSequenceNumber:<nil> BlobType:0xc0003652b0 CacheControl:0xc000365290 ContentDisposition:0xc0003652a0 ContentEncoding:0xc000365270 ContentLanguage:0xc000365280 ContentLength:0xc00021b9a8 ContentMD5:[106 88 100 47 79 70 48 57 47 115 105 66 88 83 68 51 83 87 65 109 51 65 61 61] ContentType:0xc000365260 CopyCompletionTime:<nil> CopyID:<nil> CopyProgress:<nil> CopySource:<nil> CopyStatus:<nil> CopyStatusDescription:<nil> CreationTime:2022-01-14 21:24:17 +0000 GMT CustomerProvidedKeySHA256:<nil> DeletedTime:<nil> DestinationSnapshot:<nil> EncryptionScope:<nil> ExpiresOn:<nil> IncrementalCopy:<nil> IsSealed:<nil> LeaseDuration:<nil> LeaseState:0xc0003652e0 LeaseStatus:0xc0003652d0 RehydratePriority:<nil> RemainingRetentionDays:<nil> ServerEncrypted:0xc00021bb59 TagCount:<nil>}

Thank you!

@deiter
Copy link

deiter commented Jan 15, 2022

And of course we can decode base64 to []byte:

        if aux.ContentMD5 != nil {
                contentMD5, err := base64.StdEncoding.DecodeString(*aux.ContentMD5)
                if err != nil {
                        return err
                }
                b.ContentMD5 = contentMD5
        }

@evenh
Copy link

evenh commented Jan 27, 2022

I'm encountering this issue as well. Had to apply @deiter's fix to a fork of mine in order to get Content-MD5 in properties response. How can we in the community best help in order to get this bug fixed?

evenh added a commit to evenh/azure-sdk-for-go that referenced this issue Jan 28, 2022
@jhendrixMSFT
Copy link
Member

Fixed in sdk/storage/azblob/v0.3.0

@evenh
Copy link

evenh commented Feb 10, 2022

0.3.0 doesn't seem different in this regard than 0.2.0 to me

@mohsha-msft
Copy link
Contributor

Are you able to reproduce this issue again @evenh ?

Here's a test which I added to make sure this issue isn't happening again.

func (s *azblobTestSuite) TestListBlobIncludeMetadata() {

@evenh
Copy link

evenh commented Feb 10, 2022

I see that I wasn't specific enough about mye use case.

We use AzCopy to upload a bunch of files (which puts MD5 in the metadata) and we'd like to access that property via this SDK. My testing suggests that the Content-MD5 property always is []. I've fixed it temporarily using @deiter's fix in evenh@5d716a1 for my own purposes.

I couldn't figure out to run the tests locally, so haven't tried them.

@jhendrixMSFT
Copy link
Member

This is a codegen issue, I've opened Azure/autorest.go#774 to track it.

For now, @mohsha-msft can you fix this up manually? I don't know how many locations we improperly unmarshal base-64 strings.

@evenh
Copy link

evenh commented Feb 10, 2022

Thanks 👏🏻

@mohsha-msft
Copy link
Contributor

Thanks Joel. I'll keep track of this. I'll fix it manually for now. Do we have any transform to fix this?

@mohsha-msft mohsha-msft reopened this Feb 10, 2022
@jhendrixMSFT
Copy link
Member

I don't think it can be fixed via a transform.

@mohsha-msft
Copy link
Contributor

Hey @jkowalski , @berndverst , @evenh , and @deiter,

azblob v0.4.0 is now publically available. I have fixed the issue here and here.

Please reach out and reopen the issue if it still persists.

Thanks a lot for your feedbacks!

@RickWinter RickWinter removed the needs-team-attention This issue needs attention from Azure service team or SDK team label Apr 20, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

9 participants