Skip to content

Notes on building alternatives

Eugene Lazutkin edited this page Mar 17, 2024 · 6 revisions

The current implementation of RE2 bindings uses C++ addons technique. The reason is purely historical: when I started we had no other alternatives. At this moment (March 2024) we have two possible alternatives:

These notes were recorded to document thoughts on alternative implementations, experiments, and attempts made.

The current implementation

The current implementation uses custom C++ bindings written with NaN and compiled by node-gyp.

The problems with this approach in no particular order:

  • Compiling by users:
    • Users should install all necessary software (Python, a suitable C++ compiler).
      • These dependencies are not described by package.json. They are "extraband".
      • If something goes wrong, it is hard to debug.
      • The hardware requirements for compiling can exceed what is required to run the final software.
    • The compilation can take a lot of time making it unsuitable for building short-lived instances.
  • Precompiled binaries:
    • Requires to maintain a build infrastructure.
    • A separate build is required for every combination of ABI (usually a major Node version), OS, and processor architecture.
      • It takes some time to build all combinations.
      • Not all permutations are supported by GitHub Actions runners.
        • Some builds have to be produced manually.
    • Requires storage to keep all rebuild binaries.
      • The storage should be secured.
    • Requires a way to retrieve a required binary.
      • If the binary is missing or cannot function properly, it should be compiled locally.
  • Dependencies:
    • Updating dependencies can lead to updating the code of C++ bindings.
    • New Node versions can trigger updates in NaN, which may require extensive changes in the code.
    • Debugging of C++ addons is still in the Stone Age.
  • Support of alternative platforms:

Possible alternatives

  • WASM
    • Pros:
      • An addon can be compiled once and shipped for all supported platforms.
        • No special build infrastructure.
        • No special retrieval of precompiled binaries.
      • Supported by web browsers, Deno, and Bun.
    • Cons:
      • The compilation target is interpreted and can be slow when running.
      • Support varies. For example, browsers expect an asynchronous loading of WASM files.
  • Node-API:
    • Pros:
      • It does not depend on ABI. Instead, it depends on a Node-API version, which is more conservative and changes less frequently. See Node-API version matrix.
        • It used to be that higher versions supported older versions being strict supersets. It is changing now and rules of support become less clear.
    • Supported by Bun.
    • Cons:
      • It still requires node-gyp and the associated setup locally.
      • It still requires a build infrastructure.

WASM

re2-wasm

There is an alternative RE2 WASM project based on an old version of node-re2: re2-wasm. Theoretically, it can be used as a benchmarking target for a possible WASM-based implementation of RE2. As of now (March 2024) it has not been updated for 2 years.

I tried to benchmark it against node-re2 using nano-bench. The results:

$ node bench.js 

re: (\w+)+$ s: aaaaaaaaaaaaaaaaaaaaaa!

measuring (confidence interval: 95%, series: 100, bootstrap samples: 1,000) ...

"re2": median 581.3ns +12.6ns -9.7ns for 50k iterations in 100 series
"rew": median 10.4μs +8.4μs -1.2μs for 500 iterations in 100 series

The difference is STATISTICALLY SIGNIFICANT!

re: (.*a){30} s: aaaaaaaaaaaaaaaaaaaaaaaaaaaX

measuring (confidence interval: 95%, series: 100, bootstrap samples: 1,000) ...

"re2": median 659ns +83.7ns -8ns for 50k iterations in 100 series
"rew": median 91μs +48μs -20μs for 1 iterations in 100 series

The difference is STATISTICALLY SIGNIFICANT!

As we can see the difference is more than an order of magnitude. For these particular cases, it is 20-150 times slower.

Attempting to increase measurement intervals unexpectedly breaks WASM:

$ node bench2.js 

measuring (confidence interval: 95%, series: 1,000, bootstrap samples: 10,000) ...

"re2": median 574.1ns +12.9ns -5.9ns for 50k iterations in 1,000 series
"rew": median 1.94ms +1.42ms -0.67ms for 1 iterations in 1,000 series

The difference is STATISTICALLY SIGNIFICANT!

measuring (confidence interval: 95%, series: 1,000, bootstrap samples: 10,000) ...

Cannot enlarge memory arrays to size 16785408 bytes (OOM). Either (1) compile with  -s INITIAL_MEMORY=X  with X higher than the current value 16777216, (2) compile with  -s ALLOW_MEMORY_GROWTH=1  which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with  -s ABORTING_MALLOC=0 
/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1393
  var e = new WebAssembly.RuntimeError(what);
          ^

RuntimeError: abort(Cannot enlarge memory arrays to size 16785408 bytes (OOM). Either (1) compile with  -s INITIAL_MEMORY=X  with X higher than the current value 16777216, (2) compile with  -s ALLOW_MEMORY_GROWTH=1  which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with  -s ABORTING_MALLOC=0 ) at Error
    at jsStackTrace (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1675:19)
    at stackTrace (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1692:16)
    at abort (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1387:44)
    at abortOnCannotGrowMemory (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3521:7)
    at _emscripten_resize_heap (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3525:7)
    at wasm://wasm/003466ba:wasm-function[5377]:0xbfea0
    at wasm://wasm/003466ba:wasm-function[5372]:0xbdcde
    at wasm://wasm/003466ba:wasm-function[5259]:0xbb288
    at wasm://wasm/003466ba:wasm-function[301]:0x8719
    at wasm://wasm/003466ba:wasm-function[3163]:0x7ec85
    at abort (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1393:11)
    at abortOnCannotGrowMemory (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3521:7)
    at _emscripten_resize_heap (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3525:7)
    at wasm://wasm/003466ba:wasm-function[5377]:0xbfea0
    at wasm://wasm/003466ba:wasm-function[5372]:0xbdcde
    at wasm://wasm/003466ba:wasm-function[5259]:0xbb288
    at wasm://wasm/003466ba:wasm-function[301]:0x8719
    at wasm://wasm/003466ba:wasm-function[3163]:0x7ec85
    at wasm://wasm/003466ba:wasm-function[3147]:0x7e6d1
    at wasm://wasm/003466ba:wasm-function[3138]:0x7e268

Node.js v21.5.0

We can see that boosting the number of series from 100 to 1,000 significantly slows down re2-wasm for some unknown reason and breaks it later. This alone makes WASM an unsuitable target for mission-critical applications for now (March 2024).