Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put with unexpected EOF? and how to debug it? #739

Open
zhucan opened this issue Mar 25, 2024 · 15 comments
Open

Put with unexpected EOF? and how to debug it? #739

zhucan opened this issue Mar 25, 2024 · 15 comments

Comments

@zhucan
Copy link

zhucan commented Mar 25, 2024

2024/03/22 02:22:20 PUT /cas/e815d929826ce2f092e3b611233b41d175500c54b7c3f9ca015515df26cba6aa: unexpected EOF

@mostynb
Copy link
Collaborator

mostynb commented Mar 25, 2024

That log comes from somewhere in server/http.go - there are only a few places in that file that don't use a sepecific error message in the log line- are you using a http proxy backend? ie I wonder if it's this line (connection lost while uploading to the backend?):
https://github.com/buchgr/bazel-remote/blob/master/server/http.go#L422

@zhucan
Copy link
Author

zhucan commented Mar 26, 2024

kubectl get secret -n cicd secret-env -o yaml
apiVersion: v1
data:
  BAZEL_REMOTE_S3_ACCESS_KEY_ID: xxxx
  BAZEL_REMOTE_S3_AUTH_METHOD: YWNjZXNzX2tleQ==
  BAZEL_REMOTE_S3_BUCKET: bmV3LWJhemVsLXJlbW90ZS1jYWNoZQ==
  BAZEL_REMOTE_S3_DISABLE_SSL: dHJ1ZQ==
  BAZEL_REMOTE_S3_ENDPOINT: czMtY2xvdWQtc3NkMDEuZGVlcHJvdXRlLmNuOjgw
  BAZEL_REMOTE_S3_SECRET_ACCESS_KEY:  xxxx
kind: Secret
metadata:
  annotations:
    meta.helm.sh/release-name: bazel-cache-service
    meta.helm.sh/release-namespace: cicd
  creationTimestamp: "2024-03-06T03:35:43Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: secret-env
  namespace: cicd
  resourceVersion: "38197933"
  uid: 51b974aa-001e-4d66-a735-331a56464b82
type: Opaque

The backend storage is s3, with domain name to connect backend storage. @mostynb

@ulrfa
Copy link
Contributor

ulrfa commented Mar 26, 2024

I believe (but I'm not 100% sure, and it could be related to my internal patches) that "PUT ... unexpected EOF" is logged by bazel-remote, when bazel aborts an upload (e.g. closing the TCP connection) without completing the upload.

Edited: In other words, AFAIK, that log is nothing to worry about and bazel-remote is working as expected.

I don't know if it is also related to the http proxy backend, but I experience it without using the http proxy backend.

@zhucan
Copy link
Author

zhucan commented Mar 27, 2024

so your suggestion is which backend storage is better? @ulrfa

@ulrfa
Copy link
Contributor

ulrfa commented Mar 27, 2024

Bazel-remote is always using local storage. Even if also enabling any of the proxies (http, grpc, s3, gcs, ...) the blobs are still also stored/buffered in the local storage.

When bazel-remote downloads blobs via proxies, the blobs are completely written (and flushed AFAIK) to local storage before being propagated to the client requesting them, they are not just streamed with buffer in RAM. Therefore I suggest using proxies only if your use cases requires that, and otherwise use local storage without proxies.

I personally don't have much experience of the different proxies (http, grpc, s3, gcs, ...). I guess they all work, and that there might be differences among them, e.g. regarding how efficient they can implement findMissingDigest requests, but I don't know.

@zhucan
Copy link
Author

zhucan commented Mar 27, 2024

our case is gitlab ci, I don't know it's suitabled for it with s3 storage. @ulrfa

@ulrfa
Copy link
Contributor

ulrfa commented Mar 27, 2024

I have no experience of gitlab ci.

mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 7, 2024
This might help diagnose problems like buchgr#739.
mostynb added a commit to mostynb/bazel-remote that referenced this issue Apr 7, 2024
This might help diagnose problems like buchgr#739.
@mostynb
Copy link
Collaborator

mostynb commented Apr 7, 2024

I pushed an update which will give a bit more context, so we can at least see which location in server/http.go gives this error for you. Could you try a build from the tip of the master branch, and report back the updated error message?

@zhucan
Copy link
Author

zhucan commented Apr 8, 2024

@mostynb ok, I will try it later

@zhucan
Copy link
Author

zhucan commented Apr 8, 2024

2024/04/08 02:31:52 bazel-remote built with go1.22.0 X:nocoverageredesign from git commit 0b3da3f6ccdf71a19a0e5c919996eeca24b9354b.
2024/04/08 02:31:52 Initial RLIMIT_NOFILE cur: 1048576 max: 1048576
2024/04/08 02:31:52 Setting RLIMIT_NOFILE cur: 1048576 max: 1048576
2024/04/08 02:31:52 Storage mode: zstd
2024/04/08 02:31:52 Zstandard implementation: go
2024/04/08 02:31:52 Limiting concurrent file removals to 5000
2024/04/08 02:31:52 Loading existing files in /data/remote_cache.
2024/04/08 02:31:52 Scanning cache directory with 16 goroutines
2024/04/08 02:31:52 Sorting cache files by atime.
2024/04/08 02:31:52 Building LRU index.
2024/04/08 02:31:52 Finished loading disk cache files.
2024/04/08 02:31:52 Authentication: disabled
2024/04/08 02:31:52 Mangling non-empty instance names with AC keys: disabled
2024/04/08 02:31:52 Starting HTTP server for profiling on address :6060
2024/04/08 02:31:52 Loaded 0 existing disk cache items.
2024/04/08 02:31:52 Endpoint metrics: enabled
2024/04/08 02:31:52 gRPC AC dependency checks: enabled
2024/04/08 02:31:52 experimental gRPC remote asset API: disabled
2024/04/08 02:31:52 Starting gRPC server on address :9092
2024/04/08 02:31:52 HTTP AC validation: enabled
2024/04/08 02:31:52 Starting HTTP server on address :8080
2024/04/08 02:33:47 PUT /cas/8e79150c0103a94fba9f847ca2585b407220d446b6184f5f99b09d7906b4f665: unexpected EOF
2024/04/08 02:33:49 PUT /cas/3ede32c9eb75dcd7aa8fba723a344b530c21b2d08a8b16988aa85ac0d7caf0fa: unexpected EOF

compile with the master branch and try it again, no detail error message @mostynb

@mostynb
Copy link
Collaborator

mostynb commented Apr 9, 2024

I think all the PUT errors in server/http.go add some extra information, except for cache.Error errors which are returned from the disk cache layer on this line: err := h.cache.Put(r.Context(), kind, hash, contentLength, rdr). So I pushed another commit which adds more error annotations to that path- could you try the tip of master again, and see if it gives more useful error messages?

@zhucan
Copy link
Author

zhucan commented Apr 10, 2024

ok,I will try it again/

@zhucan
Copy link
Author

zhucan commented Apr 11, 2024

2024/04/11 03:04:55 PUT /cas/fa1471c5331ede2600b2ebb9ce48225a4b32dbc92b827e27f615eb0221f5c412: Failed to write compressed CAS blob to disk: Failed to read 1048577 bytes: unexpected EOF
2024/04/11 03:04:55 PUT /cas/87c291acb827a1ad0a3150dced0d441909814fde2c453a145990e77ba57d6a84: Failed to write compressed CAS blob to disk: Failed to read 1048577 bytes: unexpected EOF
2024/04/11 03:04:55 PUT /cas/71f3554fce90a28e3838da637a3821b2aa7fbded32cc9d08a90a619da580d0d7: Failed to write compressed CAS blob to disk: Failed to read 1048577 bytes: unexpected EOF

@mostynb

@mostynb
Copy link
Collaborator

mostynb commented Apr 11, 2024

Thanks, now we know that the error is coming from an io.ReadFull call in casblob.WriteAndClose.

Is the client uploading an uncompressed blob? I guess the most likely scenario is one of these:

  1. The client cancelled the upload before it finished.
  2. The upload timed out before it finished.

I pushed another update, which tries to identify these causes. Do you have any client side logs for these failed uploads?

@zhucan
Copy link
Author

zhucan commented Apr 12, 2024

it's difficult to get the logs, I will try it. @mostynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants