Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awslogs log driver: occasionally attempts to upload batch that is too large #37747

Closed
burdakovd opened this issue Sep 1, 2018 · 6 comments · Fixed by #37986
Closed

awslogs log driver: occasionally attempts to upload batch that is too large #37747

burdakovd opened this issue Sep 1, 2018 · 6 comments · Fixed by #37986
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.

Comments

@burdakovd
Copy link

burdakovd commented Sep 1, 2018

Description

With aws-logs log driver, Docker occasionally drops logs due to errors like "InvalidParameterException: Upload too large: 1188330 bytes exceeds limit of 1048576".

It seems it fails to estimate size of request correctly, and to split request to AWS into smaller requests.

Steps to reproduce the issue:

  1. Configure Docker to send logs to awslogs (see config below)
  2. Run the following command to produce demo logs: docker run ubuntu bash -c 'for x in 1 2 3 4 5 6 7 8; do cat /dev/urandom | tr -d "\n" | head -c 131050; echo; done' | gzip | wc -c

Describe the results you received:
Observe logs did not get sent to awslogs, but instead in /var/log/messages we have InvalidParameterException: Upload too large: 1188330 bytes exceeds limit of 1048576

Describe the results you expected:
Logs to get into awslogs without errors.

Additional information you deem important (e.g. issue happens only occasionally):

Furthermore, it appears Docker somehow doubles the payload size. I.e. initially I was planning to do the following as repro: docker run ubuntu bash -c 'for x in 1 2 3 4; do cat /dev/urandom | tr -d "\n" | head -c 262140; echo; done' | gzip | wc -c - but it failed with "InvalidParameterException: Log event too large: 476138 bytes exceeds limit of 262144", indicating that Docker is sending log event that is 2x bigger than the original log line.

The repro steps of course are just random data, but I observe this issue happening once or twice a day with logs from ipfs daemon (in debug mode, so it is sending quite significant amount of logs).

Apart from seemingly wrong strategy of splitting data to send to AWS, it would also be great if Docker was dumping the failed blobs somewhere. It would help to diagnose this particular issue better, and also would be better from audit perspective, as currently if someone gets in the container that is monitored using awslogs, they can flood it in order to cause logs loss, and therefore hide their activity.

Config that I'm using to send data to awslogs:

cat /etc/docker/daemon.json
{
  "log-driver": "awslogs",
  "log-opts": {
    "awslogs-region": "us-east-1",
    "awslogs-group": "docker.i-019b04c8046828f2d",
    "awslogs-create-group": "true",
    "tag": "{{.Name}}-{{.ID}}"
  }
}

This is similar to #35725 - but I decided to open a new issue instead of reopening that one as it is quite old.

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.6
 Git commit:   3dfb8343b139d6342acfd9975d7f1068b5b1c3d3
 Built:        Wed Jul 25 00:48:56 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.6
  Git commit:   7390fc6/18.03.1-ce
  Built:        Wed Jul 25 00:51:07 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 26
 Running: 4
 Paused: 0
 Stopped: 22
Images: 68
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: awslogs
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.14.62-70.117.amzn2.x86_64
Operating System: Amazon Linux 2
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.853GiB
Name: ip-172-31-9-51.ec2.internal
ID: XCDZ:TO2M:DBAZ:BNT2:KKTY:3IMX:IYHT:ZOLU:FCP6:QZRK:3UVJ:7CHK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
AWS AMI Linux 2

@cpuguy83
Copy link
Member

cpuguy83 commented Sep 4, 2018

ping @jahkeup @samuelkarp

@thaJeztah
Copy link
Member

/cc @anusha-ragunathan

@thaJeztah thaJeztah added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Sep 10, 2018
@samuelkarp
Copy link
Member

Apologies for the delay; this came in while I was on vacation and then I missed following up when I got back. I'll take a look here, but I don't expect to be able to send a PR until sometime next week at the earliest.

@samuelkarp
Copy link
Member

Hey @burdakovd, just wanted to give you an update: I've been looking at this and have reproduced this with a bit more fidelity, outside of Docker. I believe I understand where the inflated number of bytes is coming from, but I'm still attempting to narrow down where exactly the bug is.

@samuelkarp
Copy link
Member

@burdakovd See #37986 for a fix.

The problem here is that I misread the CloudWatch Logs documentation when writing the initial version of the driver, and did not account for UTF-8 encoding changing the length of the bytes that I was sending. The pull request above should fix the counting and stop the driver itself from splitting valid UTF-8 byte sequences into invalid byte sequences, though it appears that there may be another bug in Docker causing splits prior to the driver being invoked.

samuelkarp added a commit to samuelkarp/docker that referenced this issue Oct 10, 2018
The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings.  Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8.  This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.

This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.

Fixes moby#37747

Signed-off-by: Samuel Karp <skarp@amazon.com>
@samuelkarp
Copy link
Member

@burdakovd Now that the fix has been merged, we're planning to get it backported into the Amazon Linux / Amazon Linux 2 Docker package.

docker-jenkins pushed a commit to docker/docker-ce that referenced this issue Oct 18, 2018
The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings.  Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8.  This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.

This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.

Fixes moby/moby#37747

Signed-off-by: Samuel Karp <skarp@amazon.com>
Upstream-commit: 1e8ef386279e2e28aff199047e798fad660efbdd
Component: engine
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Nov 8, 2018
The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings.  Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8.  This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.

This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.

Fixes moby#37747

Signed-off-by: Samuel Karp <skarp@amazon.com>
(cherry picked from commit 1e8ef38)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Nov 8, 2018
The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings.  Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8.  This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.

This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.

Fixes moby#37747

Signed-off-by: Samuel Karp <skarp@amazon.com>
(cherry picked from commit 1e8ef38)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker/docker-ce that referenced this issue Nov 27, 2018
The CloudWatch Logs API defines its limits in terms of bytes, but its
inputs in terms of UTF-8 encoded strings.  Byte-sequences which are not
valid UTF-8 encodings are normalized to the Unicode replacement
character U+FFFD, which is a 3-byte sequence in UTF-8.  This replacement
can cause the input to grow, exceeding the API limit and causing failed
API calls.

This commit adds logic for counting the effective byte length after
normalization and splitting input without splitting valid UTF-8
byte-sequences into two invalid byte-sequences.

Fixes moby/moby#37747

Signed-off-by: Samuel Karp <skarp@amazon.com>
(cherry picked from commit 1e8ef386279e2e28aff199047e798fad660efbdd)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 757650e8dcca87f95ba083a80639769a0b6ca1cc
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants