Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon S3: Mismatch when reading HTTP header from GCS #8790

Open
gouyelliot opened this issue May 3, 2024 · 0 comments · May be fixed by #8791
Open

Amazon S3: Mismatch when reading HTTP header from GCS #8790

gouyelliot opened this issue May 3, 2024 · 0 comments · May be fixed by #8791

Comments

@gouyelliot
Copy link

Bug Report

Describe the bug
While configuring our FluentBit instance, I reached a situation when the Amazon S3 Output Plugin would block for several minutes when trying to send the files.

I'm using Google Cloud Storage with a HMAC key (set with env vars), using the endpoint configuration set to https://storage.googleapis.com.

Here is the log when the plugin is trying to send the data:

fluent-bit[1343]: [2024/05/03 03:23:21] [debug] [upstream] KA connection #28 to storage.googleapis.com:443 is connected
fluent-bit[1343]: [2024/05/03 03:23:21] [debug] [http_client] not using http_proxy for header
fluent-bit[1343]: [2024/05/03 03:23:21] [debug] [aws_credentials] Requesting credentials from the env provider..
---
Here FluentBit blocks for 4 minutes...
---
fluent-bit[1343]: [2024/05/03 03:27:21] [error] [http_client] broken connection to storage.googleapis.com:443 ?
fluent-bit[1343]: [2024/05/03 03:27:21] [debug] [upstream] KA connection #28 to storage.googleapis.com:443 is now available
fluent-bit[1343]: [2024/05/03 03:27:21] [debug] [output:s3:s3-bids] PutObject http status=200
fluent-bit[1343]: [2024/05/03 03:27:21] [ info] [output:s3:s3-bids] Successfully uploaded object /source=sspengine/type=improvedigital_bids/year=2024/month=05/day=03/fluentd-aggregator-1-ams-testing-03-N3Fki4QI.json.gz

After digging in the source code, I found that the problem comes from the header_lookup function, which get the value of a header from the HTTP response.

Turns out that Google have a custom HTTP header named x-goog-stored-content-length, which is matched by the header_lookup instead of the Content-Length header here, resulting in the client trying to read from the socket again, and timing out after 4 minutes.

Here a example of HTTP response payload from GCS:

HTTP/1.1 200 OK
ETag: "f75bc68bd2645e669b5208da00ea3e02"
x-goog-generation: 1714721454001939
x-goog-metageneration: 1
x-goog-hash: crc32c=HBNrRA==
x-goog-hash: md5=91vGi9JkXmabUgjaAOo+Ag==
x-amz-checksum-crc32c: HBNrRA==
x-goog-stored-content-length: 1973
x-goog-stored-content-encoding: gzip
Vary: Origin
X-GUploader-UploadID: ABPtcPrl0G_stANkY8LXdqEaWL9nGpZjkCYHNFAyBZYlpvHDqJ0gfRAEkKsEM79BWkfhnoMC56g
Content-Length: 0
Date: Fri, 03 May 2024 07:30:54 GMT
Server: UploadServer
Content-Type: text/html; charset=UTF-8
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000

To Reproduce
Here my current config

[SERVICE]
    flush        5
    grace        30
    daemon       Off
    log_level    debug
    parsers_file /path/to/parsers.conf

[INPUT]
    Name   tail
    Parser test
    Path   /path/to/test.log

[OUTPUT]
    Name            s3
    Alias           s3-bids
    Match           *
    bucket          my-bucket
    compression     gzip
    upload_timeout  1m
    store_dir       /tmp/fluentbit/log
    use_put_object  On
    retry_limit     3
    total_file_size 250M
    content_type    application/json
    region          auto
    storage_class   STANDARD
    endpoint        https://storage.googleapis.com
    s3_key_format   /source=test/year=%Y/month=%m/day=%d/test-log-%H-$UUID.json.gz

Expected behavior
The HTTP client should not use the x-goog-stored-content-length header as the content length of the request.

I'll try to create a PR next week, the bug is actually no hard to fix !

Your Environment

  • Version used: 3.0.3
  • Configuration: See above
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version: x64 Intel CPU
  • Operating System and version: AlmaLinux 9
  • Filters and plugins: No filters, plugins Amazon S3
@gouyelliot gouyelliot changed the title Amazon S3: Mismatch when reading HTTP header when using GCS Amazon S3: Mismatch when reading HTTP header using GCS May 3, 2024
@gouyelliot gouyelliot changed the title Amazon S3: Mismatch when reading HTTP header using GCS Amazon S3: Mismatch when reading HTTP header from GCS May 3, 2024
@gouyelliot gouyelliot linked a pull request May 3, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant