awslogs log driver: occasionally attempts to upload batch that is too large #37747

burdakovd · 2018-09-01T17:31:06Z

Description

With aws-logs log driver, Docker occasionally drops logs due to errors like "InvalidParameterException: Upload too large: 1188330 bytes exceeds limit of 1048576".

It seems it fails to estimate size of request correctly, and to split request to AWS into smaller requests.

Steps to reproduce the issue:

Configure Docker to send logs to awslogs (see config below)
Run the following command to produce demo logs: docker run ubuntu bash -c 'for x in 1 2 3 4 5 6 7 8; do cat /dev/urandom | tr -d "\n" | head -c 131050; echo; done' | gzip | wc -c

Describe the results you received:
Observe logs did not get sent to awslogs, but instead in /var/log/messages we have InvalidParameterException: Upload too large: 1188330 bytes exceeds limit of 1048576

Describe the results you expected:
Logs to get into awslogs without errors.

Additional information you deem important (e.g. issue happens only occasionally):

Furthermore, it appears Docker somehow doubles the payload size. I.e. initially I was planning to do the following as repro: docker run ubuntu bash -c 'for x in 1 2 3 4; do cat /dev/urandom | tr -d "\n" | head -c 262140; echo; done' | gzip | wc -c - but it failed with "InvalidParameterException: Log event too large: 476138 bytes exceeds limit of 262144", indicating that Docker is sending log event that is 2x bigger than the original log line.

The repro steps of course are just random data, but I observe this issue happening once or twice a day with logs from ipfs daemon (in debug mode, so it is sending quite significant amount of logs).

Apart from seemingly wrong strategy of splitting data to send to AWS, it would also be great if Docker was dumping the failed blobs somewhere. It would help to diagnose this particular issue better, and also would be better from audit perspective, as currently if someone gets in the container that is monitored using awslogs, they can flood it in order to cause logs loss, and therefore hide their activity.

Config that I'm using to send data to awslogs:

cat /etc/docker/daemon.json
{
  "log-driver": "awslogs",
  "log-opts": {
    "awslogs-region": "us-east-1",
    "awslogs-group": "docker.i-019b04c8046828f2d",
    "awslogs-create-group": "true",
    "tag": "{{.Name}}-{{.ID}}"
  }
}

This is similar to #35725 - but I decided to open a new issue instead of reopening that one as it is quite old.

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.6
 Git commit:   3dfb8343b139d6342acfd9975d7f1068b5b1c3d3
 Built:        Wed Jul 25 00:48:56 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.6
  Git commit:   7390fc6/18.03.1-ce
  Built:        Wed Jul 25 00:51:07 2018
  OS/Arch:      linux/amd64
  Experimental: false

Output of docker info:

Containers: 26
 Running: 4
 Paused: 0
 Stopped: 22
Images: 68
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: awslogs
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.14.62-70.117.amzn2.x86_64
Operating System: Amazon Linux 2
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.853GiB
Name: ip-172-31-9-51.ec2.internal
ID: XCDZ:TO2M:DBAZ:BNT2:KKTY:3IMX:IYHT:ZOLU:FCP6:QZRK:3UVJ:7CHK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
AWS AMI Linux 2

The text was updated successfully, but these errors were encountered:

cpuguy83 · 2018-09-04T16:28:45Z

ping @jahkeup @samuelkarp

thaJeztah · 2018-09-10T14:09:20Z

/cc @anusha-ragunathan

samuelkarp · 2018-09-11T19:43:54Z

Apologies for the delay; this came in while I was on vacation and then I missed following up when I got back. I'll take a look here, but I don't expect to be able to send a PR until sometime next week at the earliest.

samuelkarp · 2018-09-21T00:54:13Z

Hey @burdakovd, just wanted to give you an update: I've been looking at this and have reproduced this with a bit more fidelity, outside of Docker. I believe I understand where the inflated number of bytes is coming from, but I'm still attempting to narrow down where exactly the bug is.

samuelkarp · 2018-10-07T04:10:41Z

@burdakovd See #37986 for a fix.

The problem here is that I misread the CloudWatch Logs documentation when writing the initial version of the driver, and did not account for UTF-8 encoding changing the length of the bytes that I was sending. The pull request above should fix the counting and stop the driver itself from splitting valid UTF-8 byte sequences into invalid byte sequences, though it appears that there may be another bug in Docker causing splits prior to the driver being invoked.

The CloudWatch Logs API defines its limits in terms of bytes, but its inputs in terms of UTF-8 encoded strings. Byte-sequences which are not valid UTF-8 encodings are normalized to the Unicode replacement character U+FFFD, which is a 3-byte sequence in UTF-8. This replacement can cause the input to grow, exceeding the API limit and causing failed API calls. This commit adds logic for counting the effective byte length after normalization and splitting input without splitting valid UTF-8 byte-sequences into two invalid byte-sequences. Fixes moby#37747 Signed-off-by: Samuel Karp <skarp@amazon.com>

samuelkarp · 2018-10-18T17:59:01Z

@burdakovd Now that the fix has been merged, we're planning to get it backported into the Amazon Linux / Amazon Linux 2 Docker package.

The CloudWatch Logs API defines its limits in terms of bytes, but its inputs in terms of UTF-8 encoded strings. Byte-sequences which are not valid UTF-8 encodings are normalized to the Unicode replacement character U+FFFD, which is a 3-byte sequence in UTF-8. This replacement can cause the input to grow, exceeding the API limit and causing failed API calls. This commit adds logic for counting the effective byte length after normalization and splitting input without splitting valid UTF-8 byte-sequences into two invalid byte-sequences. Fixes moby/moby#37747 Signed-off-by: Samuel Karp <skarp@amazon.com> Upstream-commit: 1e8ef386279e2e28aff199047e798fad660efbdd Component: engine

The CloudWatch Logs API defines its limits in terms of bytes, but its inputs in terms of UTF-8 encoded strings. Byte-sequences which are not valid UTF-8 encodings are normalized to the Unicode replacement character U+FFFD, which is a 3-byte sequence in UTF-8. This replacement can cause the input to grow, exceeding the API limit and causing failed API calls. This commit adds logic for counting the effective byte length after normalization and splitting input without splitting valid UTF-8 byte-sequences into two invalid byte-sequences. Fixes moby#37747 Signed-off-by: Samuel Karp <skarp@amazon.com> (cherry picked from commit 1e8ef38) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

The CloudWatch Logs API defines its limits in terms of bytes, but its inputs in terms of UTF-8 encoded strings. Byte-sequences which are not valid UTF-8 encodings are normalized to the Unicode replacement character U+FFFD, which is a 3-byte sequence in UTF-8. This replacement can cause the input to grow, exceeding the API limit and causing failed API calls. This commit adds logic for counting the effective byte length after normalization and splitting input without splitting valid UTF-8 byte-sequences into two invalid byte-sequences. Fixes moby/moby#37747 Signed-off-by: Samuel Karp <skarp@amazon.com> (cherry picked from commit 1e8ef386279e2e28aff199047e798fad660efbdd) Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 757650e8dcca87f95ba083a80639769a0b6ca1cc Component: engine

thaJeztah added the area/logging label Sep 10, 2018

thaJeztah added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Sep 10, 2018

samuelkarp mentioned this issue Oct 7, 2018

awslogs: account for UTF-8 normalization in limits #37986

Merged

anusha-ragunathan closed this as completed in #37986 Oct 18, 2018

This was referenced Nov 8, 2018

[18.06 backport] awslogs: account for UTF-8 normalization in limits docker/engine#111

Closed

[18.09 backport] awslogs: account for UTF-8 normalization in limits docker/engine#112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awslogs log driver: occasionally attempts to upload batch that is too large #37747

awslogs log driver: occasionally attempts to upload batch that is too large #37747

burdakovd commented Sep 1, 2018 •

edited

cpuguy83 commented Sep 4, 2018

thaJeztah commented Sep 10, 2018

samuelkarp commented Sep 11, 2018

samuelkarp commented Sep 21, 2018

samuelkarp commented Oct 7, 2018

samuelkarp commented Oct 18, 2018

awslogs log driver: occasionally attempts to upload batch that is too large #37747

awslogs log driver: occasionally attempts to upload batch that is too large #37747

Comments

burdakovd commented Sep 1, 2018 • edited

cpuguy83 commented Sep 4, 2018

thaJeztah commented Sep 10, 2018

samuelkarp commented Sep 11, 2018

samuelkarp commented Sep 21, 2018

samuelkarp commented Oct 7, 2018

samuelkarp commented Oct 18, 2018

burdakovd commented Sep 1, 2018 •

edited