Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2 snappy compression panic #902

Open
jnyi opened this issue Dec 18, 2023 · 9 comments
Open

s2 snappy compression panic #902

jnyi opened this issue Dec 18, 2023 · 9 comments

Comments

@jnyi
Copy link

jnyi commented Dec 18, 2023

hi @klauspost,

Just want to report a weird panic issue from a consumer project of this fancy repo (stack traces included). thanos-io/thanos#6942, we've seen s2/writer panic with latest go runtime v1.21.+ and tried a few version of this go mod:

  • v1.17.4
  • v1.17.1
  • v1.16.7

Would appreciate if you have any valuable insights.

@klauspost
Copy link
Owner

Interesting. We (in MinIO) have sporadically seen similar crashes, but without a clear trace back to this library.

We only experienced it it Go 1.20 and later, so we have been forced to stay at Go 1.19 when we see it.

This looks like an interesting lead. I will investigate and see if I can do semi-reliable reproducers.

@mhoffm-aiven
Copy link

FWIW: we have been seeing it in our thanos built with go 1.21.3 too.

@klauspost
Copy link
Owner

@mhoffm-aiven Yeah - versions from 1.20 and forward. It seems like there is something that can cause runtime issues with these.

We have only had a few crashes, so we'd been unable to correlate it to any code. But it seems like the overlap with your issue narrows it down considerably.

@klauspost
Copy link
Owner

We may need a second common factor - could be something like an incoming signal, since it doesn't generally reproduce.

@mhoffm-aiven
Copy link

So my random observation was that it was happening far more often ( in my thanos storage gateway ) if we were using in-memory cache, so if the thanos process was using tons of memory. I moved to an external redis-cache and that helped with crashes in the storage gateway ( we still have some in the router component ). It might be somewhat related to memory pressure maybe.

@klauspost
Copy link
Owner

@mhoffm-aiven Yes. It does seem related to GC events.

My hunch is that it is a combination of goroutine preemption and the stack not being in expected state correct through/after assembly calls.

@klauspost
Copy link
Owner

I will open a golang issue and see if people with some more internal knowledge can assist.

@klauspost
Copy link
Owner

Submitted golang/go#64781 which also includes some of the issues we've seen at MinIO.

@klauspost
Copy link
Owner

@jnyi Could you try https://github.com/klauspost/compress/releases/tag/v1.17.7 or the latest?

It could be that #930 fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants