Add benchmarks #165

lppedd · 2024-01-29T10:36:34Z

I'd like to setup a couple of benchmarks using kotlinx-benchmark.
The benchmarks should measure something we can compare with other ANTLR implementations, e.g., antlr4ng.

@mike-lischke hey! Would you be ok for us to reproduce the same benchmarks for antlr-kotlin?

mike-lischke · 2024-01-29T10:39:34Z

Absolutely! It's all open source 😄

lppedd · 2024-01-29T10:49:24Z

Thank you!

To recap, it seems you:

read a sample file
split it based on a delimiter (but it's not a real split, you seem to use only the range to substring from the original file)
for each range, create a new parser instance, ~~so the single time measurement is for that single range~~
Edit: nope, looks like the measurement is for the entire file

Is that correct?

mike-lischke · 2024-01-29T12:01:45Z

Top down it's rather so:

The splitter test -> not relevant for you as it is only to measure how long it takes to determine statement ranges in a file. This hasn't to do with parsing and is just an additional number I wanted.
Run the parser test once (cold run) -> collect the execution times and print them.
Runt the parser 5 more times (warm run) -> collect the execution times.
Remove the 2 slowest runs from step 3.
Compute the average of the remaining 3 and print them.

A parser run consists of n loops, determined by the testFiles list. For each file:

Determine statement ranges (not taken into the overall time needed).
Go over each range and check if it matches a certain version number. I copied the tests from my VS Code unit tests where this matters. You could settle on a single version and remove all entries that don't it from the statements file. If the version matches, take a copy of the string part and parse that using a "parsing service", which is created on app start and kept in a separate class. So the parser is reused (otherwise you couldn't measure warm run times).
Print any error found during parse.
Compute the needed time and push it to the result list. So the length of the list correlates to the number of test files.

lppedd · 2024-01-29T13:55:30Z

Thanks!

I've managed to emulate the same workflow in Kotlin, and now I find myself dealing with a stack overflow on the JVM with this statement:

SELECT
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
1
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

java.lang.StackOverflowError
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.simpleExpr(MySQLParser.kt:45124)
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.bitExpr(MySQLParser.kt:44062)
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.predicate(MySQLParser.kt:43685)
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.boolPri(MySQLParser.kt:43457)
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.expr(MySQLParser.kt:43137)
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.exprList(MySQLParser.kt:50557)
	at com.strumenta.antlrkotlin.benchmarks.generated.MySQLParser.simpleExpr(MySQLParser.kt:45278)

Removing it completes the run successfully tho.

mike-lischke · 2024-01-29T14:16:41Z

Yes, that's a tough one. When parsing this in a thread in C++ it crash that thread unless you are on a platform that allows to set the thread stack size. This is a big problem in ANTLR as crashing a thread usually crashes the entire app (nice DDOS attack)!

The problem is the unlimited lookahead and the way prediction is implemented (as recursion instead of iteration). And additionally, expressions in MySQL (and many other languages) are highly nested so one opening par alone can result in a dozen or more sub rule invocations. That's why I love this one as a stress test.

Can you increase the stack size during execution?

mike-lischke · 2024-01-29T14:19:34Z

btw. please share numbers once you got it all running! And while you are on it, can you also run it against plain Java? I'm really interested in comparing everything.

KvanTTT · 2024-01-29T14:33:05Z

This is a big problem in ANTLR as crashing a thread usually crashes the entire app (nice DDOS attack)!

Some runtimes have methods that could check if there is enough free space in thread stack for one more recursive call. For example, EnsureSufficientExecutionStack in C#. In theory, such check can be integrated into targets that support such feature. For other runtimes, it's possible to use manual recursive call counter and calculate its limit empirically (maximum number of recursive calls).

Another way is to emulate recursive calls using separated stack. But it requires much more implementation effort.

Can you increase the stack size during execution?

Yes, it should be possible, but it's not a reliable resolving of the problem. SO exception should not occur in any case.

mike-lischke · 2024-01-29T14:41:07Z

The root cause should be removed instead of all these workarounds. In my next generation ANTLR project I plan to convert the recursive prediction code to become iterative and add an iteration limit. This has been requested before, to limit lookahead to some max value.

lppedd · 2024-01-29T15:03:39Z

Preliminary benchmark results on a i7-7820HQ - 32GB
The JS results are taken with measureTime, as kotlinx-benchmark for JS is blocked by Kotlin/kotlinx-benchmark#185

File	JS	WASM	JVM	mingwX64	ANTLR4NG
statements.txt	523.2868 ms	226.27428 ms	102.962 ms	1481.9423 ms	668.6744333306 ms
bitrix_queries_cut.sql	282.2195 ms	123.57261 ms	56.586 ms	742.65691 ms	388.6220333178 ms
sakila-data.sql	21531.5333 ms	8683.92242 ms	3652.251 ms	58584.12632 ms	28444.441866676 ms

lppedd · 2024-01-29T15:12:35Z

I think on K/JS the major performance issue is the use of a string for the generated ATN, which must then be converted to a char array, and decoded to an int array. The JS and TS runtimes use an array of integers directly.

@KvanTTT not sure if we can do something to differentiate based on the platform.

ftomassetti · 2024-01-29T15:16:53Z

The root cause should be removed instead of all these workarounds. In my next generation ANTLR project I plan to convert the recursive prediction code to become iterative and add an iteration limit. This has been requested before, to limit lookahead to some max value.

@mike-lischke About your NG project, do you think that some ideas could be contributed to the just started effort on ANTLR 5?
I am just asking out of curiosity as both your project and ANTLR 5 seem to aim to build the next generation of ANTLR and there is surely ground to cross-pollinate ideas

KvanTTT · 2024-01-29T15:24:54Z

The root cause should be removed instead of all these workarounds. In my next generation ANTLR project I plan to convert the recursive prediction code to become iterative and add an iteration limit.

Yes, it's a good solution, I also said about that.

I think on K/JS the major performance issue is the use of a string for the generated ATN, which must then be converted to a char array, and decoded to an int array.

I'm not sure it's a hot path since this String is being deserialized only once on app startup. But anyway I think using array of Int instead of String is a better option (it requires some effort on JVM side because it crashes on big arrays, but it's possible to split one huge int array to several more smaller ones).

@mike-lischke About your NG project, do you think that some ideas could be contributed to the just started effort on ANTLR 5?
I am just asking out of curiosity as both your project and ANTLR 5 seem to aim to build the next generation of ANTLR and there is surely ground to cross-pollinate ideas

It's a great idea but I'll email you about the current situation (not sure it's public info).

lppedd · 2024-01-29T15:29:24Z

@KvanTTT Terence once wrote

Java does not properly handle static initialized integer arrays

So what he meant was that the JVM simply crashes?

There is a big difference between 150/200ms of antlr4ng and 300/600ms of K/JS, so I wonder what's so heavy on our side.

KvanTTT · 2024-01-29T15:43:35Z

So what he meant was that the JVM simply crashes?

True, unfortunately JVM can't handle huge arrays unlike other runtimes out of box. But it's probably possible to use direct memory or splitting into several smaller arrays. Actually, this array is only required only once during ATN deserialising and there is no need to keep it in memory.

There is a big difference between 150/200ms of antlr4ng and 300/600ms of K/JS, so I wonder what's so heavy on our side.

What's about other parsing stages? Probably they consume much more time.

mike-lischke · 2024-01-29T15:55:54Z

Preliminary benchmark results on a i7-7820HQ - 32GB The JS results are taken with measureTime, as kotlinx-benchmark for JS is blocked by Kotlin/kotlinx-benchmark#185

File JS JVM
bitrix_queries_cut.sql 300ms 58ms
statements.txt 600ms 107ms

The different machines might have an impact here. Have you tried running the pure JS benchmarks? My box is a Mac Studio M1 Max - 32 GB.

mike-lischke · 2024-01-29T15:57:45Z

I think on K/JS the major performance issue is the use of a string for the generated ATN, which must then be converted to a char array, and decoded to an int array. The JS and TS runtimes use an array of integers directly.

I don't think this is a real issue, given that this happens only once per parser/lexer class load. That would impact only the cold run time. Once the parser is warm the ATN is in memory.

mike-lischke · 2024-01-29T16:02:17Z

The root cause should be removed instead of all these workarounds. In my next generation ANTLR project I plan to convert the recursive prediction code to become iterative and add an iteration limit. This has been requested before, to limit lookahead to some max value.

@mike-lischke About your NG project, do you think that some ideas could be contributed to the just started effort on ANTLR 5? I am just asking out of curiosity as both your project and ANTLR 5 seem to aim to build the next generation of ANTLR and there is surely ground to cross-pollinate ideas

It was a little surprising to me that ANTLR5 got started. But that's so great about open source: everyone can do what he/she likes. I prefer TypeScript and have (mostly) other goals than ANTLR5. I'm on the way to radically simplify ANTLR (just today I brought the TS runtime tests to under 10 seconds, just by reorganizing things).

mike-lischke · 2024-01-29T16:09:42Z

True, unfortunately JVM can't handle huge arrays unlike other runtimes out of box.

Really? You can just allocate an array up to 2^31 entries (more than enough for the ATN). But I'm not a Java expert so...

But it's probably possible to use direct memory or splitting into several smaller arrays.

ANTLR started out with split arrays :-D A while ago there were sections that got concatenated at load time. Using a string however is all but optimal, as it involves charset related functionality for conversion. Still, it's not a hot path and hence not worth much work to optimize.

lppedd · 2024-01-29T16:20:46Z

@mike-lischke you were right about machine differences. I've cloned your repo and run the benchmarks.
I've updated the results, and they're comparable now, see table above.

lppedd · 2024-01-29T16:23:56Z

ANTLR started out with split arrays :-D A while ago there were sections that got concatenated at load time. Using a string however is all but optimal, as it involves charset related functionality for conversion. Still, it's not a hot path and hence not worth much work to optimize.

I also have wondered by there was a switch to using string, if we could just split the arrays.
@KvanTTT am I right that is something you looked at a year or so ago (maybe in 2022)?

KvanTTT · 2024-01-29T18:26:18Z

I don't remember why it was implemented in that way, maybe it was just more easier. I've found a related PR: antlr/antlr4#3591

lppedd · 2024-01-29T23:49:25Z

I'll open a PR as soon as the others are merged, just to avoid too much stacking.

I've removed all parse error catching, so if it fails it's unrecoverable, and I've not encountered any failures (apart from the stack overflow). Benchmark times are consistent, as per updated table.

mike-lischke · 2024-01-30T08:10:35Z

I've removed all parse error catching, so if it fails it's unrecoverable, and I've not encountered any failures (apart from the stack overflow).

You mean all catch clauses in generated code? Interesting idea, but besides the fact that recovery is no longer possible (is this really needed in everyday applications?) you also cannot do the double parse approach use LL and SLL prediction modes with the bail out error strategy.

lppedd · 2024-01-30T10:13:56Z

You mean all catch clauses in generated code

Oh no no. Initially I had copied the try-catch you have in the parsing service, which accumulates errors.
I did get it wrong tho, and by ignoring the caught exception I didn't know I had statements that were failing.
It was because of wrong charsets, e.g., I had big5 instead of _big5.

Now I don't catch any exception in the benchmarking code.

lppedd · 2024-01-31T22:54:51Z

I've just added a benchmark for the WASM (wasmJs) target. See updated table.

Overall it seems WebAssembly on Node.js cuts the times in half or more compared to JS.

mike-lischke · 2024-02-01T07:58:32Z

Interesting numbers! If you could add C++ and C# (and maybe Go) too this would be the long awaited performance comparison chart for the ANTLR runtimes :-)

Can you explain more how the WASM target is structured? Is it just the runtime, used with a Java/Kotlin wrapper or is entirely compiled to a WASM module, including the generated parser/lexer? How's the WASM used then?

And another one: the JS and WASM output are both generated from Kotlin code? Btw. I'm surprised you can generate useful WASM code from Kotlin, given that this is marked as being in alpha state.

lppedd · 2024-02-01T08:55:56Z

Can you explain more how the WASM target is structured?

The runtime and the benchmark code are both compiled down to optimized WebAssembly binaries.
The only place where I do interop with JavaScript is to access Node's fs module to read files, but that's outside of the time measurements.

Kotlin outputs a small JS wrapper as entry point, which will load the WASM binary:

if (isNodeJs) {
  // ...
  const wasmBuffer = fs.readFileSync(path.resolve(dirpath, wasmFilePath));
  const wasmModule = new WebAssembly.Module(wasmBuffer);
  wasmInstance = new WebAssembly.Instance(wasmModule, importObject);
}

And another one: the JS and WASM output are both generated from Kotlin code?

Yes. Even tho the WebAssembly target is in alpha, the target itself has been around for quite some time, so I think the accumulated experience helped in getting it to work very well immediately.

The benchmark results are in line with what has been shown here.

mike-lischke · 2024-02-01T09:33:55Z

OK, thanks, that explains the good performance. Since Kotlin itself compiles to WASM, it makes no difference for the developer who's working with Kotlin code. However a, say, C++ developer, who wants the parser code to be in C++ cannot use this WASM binary, right? Same for any other language but Kotlin. Is that the correct understanding of the setup here?

lppedd · 2024-02-01T09:42:22Z

However a, say, C++ developer, who wants the parser code to be in C++ cannot use this WASM binary, right?

You can use the binary if you have a runtime for WASM, which is language/platform dependent.
In a JavaScript setting, both Node.js and the browser are themselves WASM runtimes, so you don't need any extra dependency.

Now, I'm not sure if the exported functions can actually be called by all runtimes, but I think it's going to be a yes in the future.

mike-lischke · 2024-02-01T09:47:24Z

IIUC the generated parsers, lexers, visitors and listeners are also part of the WASM binary, correct? If so that would be contrary to the idea of having these files in the target language.

lppedd · 2024-02-01T09:59:57Z

IIUC the generated parsers, lexers, visitors and listeners are also part of the WASM binary, correct?

Yes it's correct.

if so that would be contrary to the idea of having these files in the target language.

In the case of this repository, I very doubt non-Kotlin consumers exist. This exists mainly to allow Kotlin Multiplatform consumers to compile their projects to the 25 available platforms, and expose a small wrapper API to the outer world if necessary.

This is what I do in my projects, I expose a parseXYZ to JavaScript or Native, but the generated code, ANTLR runtime, and parsing logic are "closed".

ANTLR 5 goal seems similar, which is to allow ANTLR consumers to generate an all-in-one bundle that can be run by all the platforms/languages that offer a WASM runtime. Currently Kotlin/Wasm offers very limited exportability of definitions to JS, e.g., you cannot export classes, but only functions, and you cannot export non-JS native or non-primitive types.
See more here to understand how cumbersome interop is right now.

mike-lischke · 2024-02-01T10:32:56Z

Thanks Edoardo, yes I know what a pain this is currently. Last summer I tried to wrap the C++ runtime in a WASM binary (and only that), which requires to export classes so that the target code that uses this binary can still have their parsers etc. in their preferred language. This approach was pretty slow, slower than pure JS (which made me switch to the JS runtime and create a TS version from it).

The idea of having a single WASM runtime for all ANTLR targets is appealing, but not very realistic, because it requires that parser writers would always have to run Kotlin as part of their build pipeline. A language they are even less familiar with than Java (if at all). You cannot debug your generated files in this scenario and cannot use action code in your target language. For the Kotlin target no problem at all, but for all other ANTLR target languages.

lppedd · 2024-02-01T10:45:44Z

I guess in an ideal scenario the Kotlin runtime to WASM compilation process would produce something on the line of TS type declarations.

Those type declarations would allow other languages compiling down to WASM to generate language-specific bindings.
That would mean you'd have a WASM binary for the runtime built from Kotlin, and a WASM binary for the generated code built from whatever other language. The two binaries would then be deployed together, and would be able to call each other.

Edit: well that also means ANTLR would provide a way to generate code in multiple languages, as it does now.

KvanTTT · 2024-02-01T11:37:12Z

However a, say, C++ developer, who wants the parser code to be in C++ cannot use this WASM binary, right?

Kotlin also has Native target. But I'm not sure if it's convenient to bind C++ and Kotlin Native output. At least it's possible to generate dynamic libraries: https://kotlinlang.org/docs/native-dynamic-libraries.html

KvanTTT · 2024-02-01T11:45:46Z

You cannot debug your generated files in this scenario

It's correct but it's unlikely ordinary user will debug parsing code if even it's generated.

and cannot use action code in your target language.

Yes, but target-language actions are quite painful, since they require grammar duplication for different targets or other tricks. Also, different runtimes has a lot of common logic, but it's tough to support them all in actual state. Honestly, I'm not sure how to resolve this problem in the best way, but I would like to try Kotlin and yes, I'm a bit biased with Kotlin since I work on its compiler.

lppedd · 2024-02-01T13:05:38Z

Those type declarations would allow other languages compiling down to WASM to generate language-specific bindings.
That would mean you'd have a WASM binary for the runtime built from Kotlin, and a WASM binary for the generated code built from whatever other language. The two binaries would then be deployed together, and would be able to call each other

@mike-lischke this is what will be needed to implement the quoted part: https://github.com/WebAssembly/component-model

Tracked at https://youtrack.jetbrains.com/issue/KT-64569

ftomassetti · 2024-02-02T06:57:04Z

Reopened as suggested by @lppedd

lppedd · 2024-02-02T09:42:28Z

I'm trying to benchmark Native on mingwX64, but it just crashes at some point with no info. I'm figuring out where.

lppedd · 2024-02-02T09:45:10Z

It's failing on

CREATE PROCEDURE proc_overflow()
BEGIN

  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN

  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN

  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN

  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN
  BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN BEGIN

  select 1;
  select 2;
  select 3;

  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;

  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;
  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;  END;

END

I suppose this requires deep recursion, so maybe Native has some limitations?

lppedd · 2024-02-02T10:09:41Z

Well, there is really no point in benchmarking Native.
I get 6000+ ms for statements.txt, and minutes for sakila-data.sql.

Native seems feasible for small grammars, but at the size of the MySQL one, I'd avoid it.

lppedd · 2024-02-02T10:39:41Z

Ok, partially my fault here. I didn't enable optimizations.

I've updated the results table, and while it got noticeable better, it's still way behind the other platforms.

lppedd · 2024-02-07T09:59:01Z

Just as a note, JS and Native are very sensitive to non-null assertions (!!).
I got rid of them in hot paths (after profiling on the JVM) and I managed to shave off 5-10% on both platforms, for all benchmarks.

mike-lischke · 2024-02-07T10:30:24Z

Just as a note, JS and Native are very sensitive to non-null assertions (!!). I got rid of them in hot paths (after profiling on the JVM) and I managed to shave off 5-10% on both platforms, for all benchmarks.

I guess this is relevant only to generated code from Kotlin, right?

lppedd · 2024-02-07T14:55:07Z

@mike-lischke yup! You won't have this issue in TS. And on the JVM the cost is pretty much zero.

lppedd · 2024-02-08T11:08:38Z

I've benchmarked the difference between deserializing an ATN directly as a String, or by first converting it to a CharArray.

Java performs slightly worse when accessing characters via the String instance because of bounds checking, while all other platforms perform better with it (especially JS). The difference is minimal however, about 1ms in favor of the String for the MySQL ATN.

I've also tried going back to the IntArray ATN - as other targets do - and the compilation process never ends for the MySQL grammar. I've let it run for more than a hour. @KvanTTT I guess this is a performance issue of the Kotlin compiler?

KvanTTT · 2024-02-08T12:46:53Z

I've also tried going back to the IntArray ATN - as other targets do - and the compilation process never ends for the MySQL grammar. I've let it run for more than a hour. @KvanTTT I guess this is a performance issue of the Kotlin compiler?

Thanks for detecting the problem. Could you share this generated parser?

lppedd mentioned this issue Jan 30, 2024

Create a benchmarks module #166

Merged

ftomassetti closed this as completed in #166 Feb 1, 2024

ftomassetti reopened this Feb 2, 2024

Add benchmarks #165

Add benchmarks #165

Comments

lppedd commented Jan 29, 2024

mike-lischke commented Jan 29, 2024

lppedd commented Jan 29, 2024 • edited

mike-lischke commented Jan 29, 2024 • edited

lppedd commented Jan 29, 2024

mike-lischke commented Jan 29, 2024 • edited

mike-lischke commented Jan 29, 2024 • edited

KvanTTT commented Jan 29, 2024

mike-lischke commented Jan 29, 2024

lppedd commented Jan 29, 2024 • edited

lppedd commented Jan 29, 2024 • edited

ftomassetti commented Jan 29, 2024

KvanTTT commented Jan 29, 2024 • edited

lppedd commented Jan 29, 2024

KvanTTT commented Jan 29, 2024 • edited

mike-lischke commented Jan 29, 2024

mike-lischke commented Jan 29, 2024

mike-lischke commented Jan 29, 2024

mike-lischke commented Jan 29, 2024

lppedd commented Jan 29, 2024 • edited

lppedd commented Jan 29, 2024 • edited

KvanTTT commented Jan 29, 2024

lppedd commented Jan 29, 2024

mike-lischke commented Jan 30, 2024

lppedd commented Jan 30, 2024

lppedd commented Jan 31, 2024

mike-lischke commented Feb 1, 2024 • edited

lppedd commented Feb 1, 2024

mike-lischke commented Feb 1, 2024

lppedd commented Feb 1, 2024

mike-lischke commented Feb 1, 2024

lppedd commented Feb 1, 2024

mike-lischke commented Feb 1, 2024

lppedd commented Feb 1, 2024 • edited

KvanTTT commented Feb 1, 2024 • edited

KvanTTT commented Feb 1, 2024 • edited

lppedd commented Feb 1, 2024

ftomassetti commented Feb 2, 2024

lppedd commented Feb 2, 2024

lppedd commented Feb 2, 2024

lppedd commented Feb 2, 2024 • edited

lppedd commented Feb 2, 2024 • edited

lppedd commented Feb 7, 2024

mike-lischke commented Feb 7, 2024

lppedd commented Feb 7, 2024

lppedd commented Feb 8, 2024

KvanTTT commented Feb 8, 2024

lppedd commented Jan 29, 2024 •

edited

mike-lischke commented Jan 29, 2024 •

edited

mike-lischke commented Jan 29, 2024 •

edited

mike-lischke commented Jan 29, 2024 •

edited

lppedd commented Jan 29, 2024 •

edited

lppedd commented Jan 29, 2024 •

edited

KvanTTT commented Jan 29, 2024 •

edited

KvanTTT commented Jan 29, 2024 •

edited

lppedd commented Jan 29, 2024 •

edited

lppedd commented Jan 29, 2024 •

edited

mike-lischke commented Feb 1, 2024 •

edited

lppedd commented Feb 1, 2024 •

edited

KvanTTT commented Feb 1, 2024 •

edited

KvanTTT commented Feb 1, 2024 •

edited

lppedd commented Feb 2, 2024 •

edited

lppedd commented Feb 2, 2024 •

edited