Node.js SIGABRT (abort) due to EBADF from epoll_ctl() call in libuv uv__io_poll() #4684

rhansen · 2021-01-28T05:01:42Z

Lately I've noticed frequent node (v14.15.4) crashes when running backend tests. The crashes are caused by a call to abort() in libuv due to errno of EBADF ("bad file descriptor") set by a call to epoll_ctl() in uv__io_poll(). The crashes always seem to happen after the mocha tests finish running (I think) but before nyc prints out coverage stats. It doesn't always happen, so it's probably a race condition somewhere. I poked at the core file a bit but didn't see a smoking gun. Here's the stack trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff783c859 in __GI_abort () at abort.c:79
#2  0x0000000000962331 in uv__io_poll (loop=loop@entry=0x7fffdeffcad8, timeout=<optimized out>) at ../deps/uv/src/unix/linux-core.c:254
#3  0x000000000137c418 in uv_run (loop=0x7fffdeffcad8, mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:385
#4  0x0000000000abce1d in node::worker::Worker::Run() ()
#5  0x0000000000abcf28 in node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::{lambda(void*)#1}::_FUN(void*) ()
#6  0x00007ffff7a14609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007ffff7939293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

If anyone out there is familiar with libuv and can help us debug, that would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

webzwo0i · 2021-01-29T11:31:08Z

afaik there are only two worker threads involved atm, MinifyWorker and maybe handling soffice.

Looking at some of the failing backend runs I get the impression that it's not nyc but mocha that's having the problems, because the coredump msg is printed before mocha's test summary.

In mocha's changelog: #4465: Worker processes guaranteed (as opposed to "very likely") to exit before Mocha does; fixes a problem when using nyc with Mocha in parallel mode (@boneskull)
Also the latest changelog of workerpool:
Fixes and more robustness in terminating workers. Thanks @boneskull.
Fix #32, #175: the promise returned by Pool.terminate() now waits until subprocesses are dead before resolving. Thanks @boneskull.

However, we are at an older mocha version (that does not depend on workerpool yet) and we don't use parallel mode, but my best guess would still be to update mocha first. We probably need to take care of mochajs/mocha#4175 and mochajs/mocha#4315 because we use that functionality iirc.

rhansen · 2021-02-01T20:05:45Z

I haven't seen any aborts lately. Did we fix it?

JohnMcLear · 2021-02-01T20:23:33Z

I saw one today afiak.

rhansen · 2021-02-07T09:49:08Z

I saw one today. They're not happening as often I think.

rhansen · 2021-02-23T09:34:38Z

I finally figured out the problem: The upstream dirty package (not ueberDB) doesn't wait until it finishes writing queued data before closing the file descriptor. I'll send a PR upstream tomorrow.

The SIGABRTs started happening after commit edbe6d5 because that is when we started properly closing the database on exit.

We can probably work around the bug in ueberDB by sleeping a bit between flushing and closing the database.

JohnMcLear · 2021-02-23T09:35:41Z

Hah wow good spot :D

JohnMcLear · 2021-02-25T00:02:27Z

@rhansen does the latest merge close this?

rhansen · 2021-02-25T07:30:13Z

I want to keep this bug open until the upstream bugfix is published, ueberdb is updated to use it, and Etherpad is updated to use the updated ueberdb.

rhansen added the Bug label Jan 28, 2021

rhansen mentioned this issue Jan 29, 2021

tests: Stop using nyc #4688

Merged

rhansen mentioned this issue Jan 30, 2021

Rework server shutdown #4690

Merged

This was referenced Feb 24, 2021

Fix write-after-close and other bugs felixge/node-dirty#61

Merged

dirty: Work around write-after-close bug in dirty package ether/ueberDB#186

Merged

deps: Update ueberdb2 to work around dirty DB bug #4856

Merged

rhansen mentioned this issue Feb 28, 2021

deps: Update ueberdb2 to fix dirty DB bug #4887

Merged

JohnMcLear closed this as completed in #4887 Feb 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node.js SIGABRT (abort) due to EBADF from epoll_ctl() call in libuv uv__io_poll() #4684

Node.js SIGABRT (abort) due to EBADF from epoll_ctl() call in libuv uv__io_poll() #4684

rhansen commented Jan 28, 2021 •

edited

webzwo0i commented Jan 29, 2021 •

edited

rhansen commented Feb 1, 2021

JohnMcLear commented Feb 1, 2021

rhansen commented Feb 7, 2021

rhansen commented Feb 23, 2021

JohnMcLear commented Feb 23, 2021

JohnMcLear commented Feb 25, 2021

rhansen commented Feb 25, 2021

Node.js SIGABRT (abort) due to EBADF from epoll_ctl() call in libuv uv__io_poll() #4684

Node.js SIGABRT (abort) due to EBADF from epoll_ctl() call in libuv uv__io_poll() #4684

Comments

rhansen commented Jan 28, 2021 • edited

webzwo0i commented Jan 29, 2021 • edited

rhansen commented Feb 1, 2021

JohnMcLear commented Feb 1, 2021

rhansen commented Feb 7, 2021

rhansen commented Feb 23, 2021

JohnMcLear commented Feb 23, 2021

JohnMcLear commented Feb 25, 2021

rhansen commented Feb 25, 2021

rhansen commented Jan 28, 2021 •

edited

webzwo0i commented Jan 29, 2021 •

edited