-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random machine reboots whilst using io_uring #42
Comments
that's strange, which kernel version? it's not a VM right? it could be a kernel panic, so it won't be written into a log, what you can do is: redirect the console to the serial port and a second machine must be connect to the serial port |
how can i reproduce it? |
im actually not entirely sure, it happens to us a few hours after constant 500ish connection load (is reverse proxy so 1000 total, one in one out). it happens a lot faster when the connection load is higher |
Ok, it's probably a kernel panic, please could you try that? |
happened again last night on a separate machine, kernel 5.9 with latest netty io_uring. general rule of thumb we noticed was that long, sustained high throughput with constant reconnections seems to cause kernel panics. I think there's a leak somewhere. I haven't been able to do the serial port part yet as these are unmanaged remote servers, but I've sent in the ticket to get KVM |
we've also noticed that restarting the java process every few hours prevents the machine deaths |
Thats interesting... can you gather memory usage (direct, heap and kernel) while the app is running so we can see if it grows ? Also do you create and shutdown |
I'll grab numbers next time we run the process under similar circumstances. We don't create or shutdown the event loop group frequently - one time on application startup and we reuse it throughout. |
This happens on my application & I've had to fall back to I am using AWS ami I am using grpc with My heap remains stable around 9gb while sustaining a throughput of around 10k requests per second. My jvm process resident set remains around 9gb as reported by
To resolve, I change my initialization to
and it no longer leaks memory while sustaining the load. When the process is killed, the memory is returned so this isn't a case of the kernel holding on to permanent leaked memory. I can also restart my service to keep it alive - but prefer it to stay alive without hourly restarts. I'd love to use io_uring because it delivers really nice consistent low latency! I hope this information is helpful. |
we deployed some io_uring stuff to end-user facing stuff (essentially giant reverse proxies) that run Netty.
we usually average around 800 concurrent connections on each, during busy periods scaling up to 2000 on each. when we run this using the io_uring transport, after a few hours, the machines just simply die (nothing in logs relating to a reboot, etc). tried on various Debian versions, different machines, different kernel versions. the only constant is that when we run it w/ io_uring, the machines die. willing to provide any logs requested, but the issue is that its almost as if someone unplugged the power cable to these servers (no reboot feedback in syslog, events etc).
could this be related to some form of leak in either kernel io_uring or the c implementation in the Netty impl?
The text was updated successfully, but these errors were encountered: