Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XML API (#331) #1164

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

tustvold
Copy link

@tustvold tustvold commented May 12, 2023

Closes #331

It has been years since I last wrote any Golang so please let me know if I've made some silly mistakes. Most of the necessary functionality already existed, the only necessary change was to add support for the XML put API

@@ -596,23 +592,6 @@ func TestServerClientSignedUploadBucketCNAME(t *testing.T) {
if resp.StatusCode != http.StatusOK {
t.Errorf("wrong status returned\nwant %d\ngot %d", http.StatusOK, resp.StatusCode)
}
data, err := io.ReadAll(resp.Body)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was incorrect, the XML APIs should not return a body and certainly shouldn't return JSON

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Oh are signed uploads powered by the XML API? Interesting!

@@ -296,12 +299,6 @@ func (s *Server) buildMuxer() {
handler.Host(s.publicHost).Path("/{bucketName}").MatcherFunc(matchFormData).Methods(http.MethodPost, http.MethodPut).HandlerFunc(xmlToHTTPHandler(s.insertFormObject))
handler.Host(bucketHost).MatcherFunc(matchFormData).Methods(http.MethodPost, http.MethodPut).HandlerFunc(xmlToHTTPHandler(s.insertFormObject))

// Signed URLs (upload and download)
handler.MatcherFunc(s.publicHostMatcher).Path("/{bucketName}/{objectName:.+}").Methods(http.MethodPost, http.MethodPut).HandlerFunc(jsonToHTTPHandler(s.insertObject))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't have ever worked as insertObject expects query parameters encoding the object name, etc...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insertObject does not necessarily read the object name from query parameters:

default:
// Support Signed URL Uploads
if r.URL.Query().Get("X-Goog-Algorithm") != "" {
switch r.Method {
case http.MethodPost:
return s.resumableUpload(bucketName, r)
case http.MethodPut:
return s.signedUpload(bucketName, r)
}
}
return jsonResponse{errorMessage: "invalid uploadType", status: http.StatusBadRequest}

I need to take a deeper look at this, but curious if you're basing on comment on the docs? There are many undocumented things in the GCS API that we run into when someone tries to integrate with some different SDK, and I've ironically done a poor job in documenting those in fake-gcs-server 🙈

Copy link
Author

@tustvold tustvold May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, insertObject has a peculiar special case that uses X-Goog-Algorithm to detect a signed URL and then falls back to implementing something that looks like the XML API, because that is what signed URLs are. With this PR that can probably be removed 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, absolutely. insertObject definitely looks like a little monster doing too many things. Thank you very much! I'll go over the PR in details later today.

@raz-amir
Copy link
Contributor

Hi, bumping this one - any plans to fix and merge it soon? Thanks

@fsouza
Copy link
Owner

fsouza commented Jun 14, 2023

@ramir-savvy someone needs to take a look at the test failures and fix them. I can do it, but not in the near future. Maybe in a couple of weeks.

@raz-amir
Copy link
Contributor

Thank you @fsouza. I have opened #1215 which better suits my needs.

@tustvold
Copy link
Author

Just wondering what the status of this is, the lack of support for the XML APIs is causing some upstream confusion - apache/arrow-rs#5263

@Slach
Copy link

Slach commented Feb 28, 2024

=( ohh, if could be a great functionality which allow us test GCS over S3 in Altinity/clickhouse-backup#695

Slach added a commit to Altinity/clickhouse-backup that referenced this pull request Feb 28, 2024
@nickpresta
Copy link

@tustvold Is this something still on your radar?

@tustvold
Copy link
Author

We've been using this branch to run integration tests for the last year. I'd be happy to see this incorporated but that's out of my hands 😅

@fsouza
Copy link
Owner

fsouza commented Apr 26, 2024

Yeah I think the code overall looks good, but I'd need conflicts resolved and CI to be green before merging. I can give it a shot myself, but I don't expect to have any time to work on this in the next 2-3 weeks.

@Slach
Copy link

Slach commented Apr 26, 2024

@fsouza if i merge conclicts and create new PR, will it merge?

@adriangb
Copy link

it would be great to get this functionality included!

@adriangb
Copy link

adriangb commented Apr 27, 2024

I tried using this with a mix of Rust's object_store (via deltalake) and google-cloud-storage in Python and can't get them to both work:

import os

os.environ['STORAGE_EMULATOR_HOST'] = 'http://localhost:4443'
os.environ['GOOGLE_SERVICE_ACCOUNT_KEY'] = '{"gcs_base_url": "http://localhost:4443", "disable_oauth": true, "client_email": "", "private_key": "", "private_key_id": ""}'

import pyarrow
from deltalake import DeltaTable
from google.cloud import storage

client = storage.Client()

bucket = client.bucket('test-bucket')  # type: ignore

print(list(bucket.list_blobs()))

bucket.blob('test.txt').upload_from_string(b'Hello, world!')

print(list(bucket.list_blobs()))

blob = bucket.blob('test.txt')

print(blob.download_as_string())

DeltaTable.create('gs://test-bucket', schema=pyarrow.schema([('id', pyarrow.int64())]), mode='overwrite')

Running docker run -it -p 4443:4443 -v ./gcs_data:/data tustvold/fake-gcs-server -scheme http -public-host localhost:4443 -log-level debug -filesystem-root=/data makes the deltalake / object_store part work but the google sdk part fails at blob.download_as_string() with 'Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206> and I see "GET /download/storage/v1/b/test-bucket/o/test.txt?alt=media HTTP/1.1" 404 10 in the logs.

Switching to running docker run -it -p 4443:4443 -v ./gcs_data:/data tustvold/fake-gcs-server -scheme http -public-host http://localhost:4443 -log-level debug -filesystem-root=/data (so adding http:// to -public-host) makes the google sdk part work but the object_store part fail with OSError: Generic GCS error: Error performing list request: Client error with status 404 Not Found: Not Found and I see "GET /test%2Dbucket?list-type=2&prefix=_delta_log%2F HTTP/1.1" 404 10 in the logs.

I'm guessing this has to do with virtual bucket name parsing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider XML support for object PUT/GET
6 participants