Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535" #1863

jbarotin · 2017-05-09T12:55:25Z

Hi,

I tried to build a big grammar file with antlr4. I've got the following message when I execute the ANTLR code that generate java source :

Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element ```
101246 element 11 out of range 0..65535
        at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:361)
        at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
        at org.antlr.v4.codegen.model.SerializedATN.<init>(SerializedATN.java:22)
        at org.antlr.v4.codegen.model.Recognizer.<init>(Recognizer.java:64)
        at org.antlr.v4.codegen.model.Lexer.<init>(Lexer.java:27)
        at org.antlr.v4.codegen.OutputModelController.lexer(OutputModelController.java:151)
        at org.antlr.v4.codegen.OutputModelController.buildLexerOutputModel(OutputModelController.java:104)
        at org.antlr.v4.codegen.CodeGenerator.generateLexer(CodeGenerator.java:119)
        at org.antlr.v4.codegen.CodeGenPipeline.process(CodeGenPipeline.java:54)
        at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:404)
        at org.antlr.v4.Tool.process(Tool.java:354)
        at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:321)
        at org.antlr.v4.Tool.main(Tool.java:168)

From what I understand, my grammar file is too long.

Is that possible to get a message to explain that the limitation of the library is reached ?

For your test, here it is my grammar file ( a bit cryptic cause I anonymize it) :
https://gist.github.com/jbarotin/7f7caf419b9038c2a548969265788392

The text was updated successfully, but these errors were encountered:

ericvergnaud · 2017-05-09T15:30:21Z

Hi, By the looks of it, you are confusing structure and content. No structure or language grammar has thousands of keywords. Instead they have patterns to consume all those words or sentences generically. You should look at determining those patterns. Antlr will produce an AST which you can then traverse to interpret the exact words. Eric Envoyé de mon iPhone

…

jbarotin · 2017-05-09T16:04:53Z

Thanks for your answer, you are exactly describing the way I'm implementing my grammar right now.

But I loose time to understand why it's not working. I think improving the message (by intercepting the exception) will save time to ones unaware of that who wanted to build a big grammar file.

ericvergnaud · 2017-05-09T16:20:57Z

To save time, maybe you should start by reading the book?
Hard to say with obfuscated keywords, but I bet most of your rules (such as 'end_event_value') could be coded using an ID or something similar instead of thousands of possible values.

jbarotin · 2017-05-09T16:31:07Z

Yes exactly, it's not a big deal 'end_event_value' must be equal to a strict list of value. I'll validate this dynamically thought the AST.

Right now, I have not enough time to read the book, I'm just building a small tools to automatize validation of expert rules... but I have already used the boost spirit library in the past.

jbarotin · 2017-05-10T07:47:31Z

Anyway, it was just a suggestion, if you think it's a good idea but you've got no time to do that, I can try to submit a pull request. Otherwise, you can close the issue.

ericvergnaud · 2017-05-10T11:18:54Z

Yes please submit a PR Envoyé de mon iPhone

…

Le 10 mai 2017 à 15:47, Jérôme BAROTIN ***@***.***> a écrit : Anyway, it was just a suggestion, if you think it's a good idea but you've no time to do that, I can try to submit a pull request. Otherwise, you can close the issue. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

KvanTTT · 2017-05-10T20:12:01Z

@ericvergnaud the question is not about bad-written grammar but about the informative error message. It's possible to get such error thus we have to handle it properly and show a correct message.

KvanTTT · 2022-01-10T20:25:10Z

I understand how it could be fixed. Only one value exceeds the range 0..65535, it's the number of ATN states ATNSerializer.java#L85 (it has 11 index in data). It's possible to save the length in two words that allow up to 2^32 ATN states. This change may be back-compatible if use an algorithm with varying bytes count (if size in 0..32768 range then use 1 byte, overwise use 2 bytes for size). But actually, back-compatibility does not matter in this case because runtimes have version checks and generated files are not compatible with the tool of another version.

I think we should increase the size limit because there are similar user bug reports over the repository: #840, #2732, #3338. And 65535 is really not a big limit, some grammars require more states (for natural languages processing or something like that).

Refactor ATN serializer and deserializer, use ATNDataWriter, ATNDataReader Remove excess data cloning in deserializer fixes antlr#1863, fixes antlr#2732, fixes antlr#3338

…er.MAX_VALUE) fix antlr#840, fix antlr#1863, fix antlr#2732, fix antlr#3338

parrt · 2022-03-26T20:34:42Z

Fixed by #3591

KvanTTT mentioned this issue Jan 15, 2022

Normalize line separators for .txt files (fix Windows tests) #3488

Merged

KvanTTT mentioned this issue Jan 15, 2022

Get rid of ATN serialization restrictions (ATN states size can be > 65535, up to 2^31-1), remove excess serialization and allocations #3493

Closed

KvanTTT mentioned this issue Jan 21, 2022

ATN serialization improvements (Java only for demo) #3505

Closed

KvanTTT added a commit to KvanTTT/antlr4 that referenced this issue Feb 20, 2022

Support of full int range in serializer and deserializer (up to Integ…

a8378d3

…er.MAX_VALUE) fix antlr#840, fix antlr#1863, fix antlr#2732, fix antlr#3338

KvanTTT mentioned this issue Feb 20, 2022

Increase ATN states size limit, simplify ATN serialization #3546

Closed

parrt mentioned this issue Mar 19, 2022

Use signed ints for ATN serialization not uint16, except for java #3591

Merged

parrt added this to the 4.10 milestone Mar 26, 2022

parrt added atn-analysis type:cleanup labels Mar 26, 2022

parrt closed this as completed Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535" #1863

Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535" #1863

jbarotin commented May 9, 2017 •

edited

ericvergnaud commented May 9, 2017 via email

jbarotin commented May 9, 2017 •

edited

ericvergnaud commented May 9, 2017

jbarotin commented May 9, 2017

jbarotin commented May 10, 2017 •

edited

ericvergnaud commented May 10, 2017 via email

KvanTTT commented May 10, 2017

KvanTTT commented Jan 10, 2022 •

edited

parrt commented Mar 26, 2022

Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535" #1863

Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535" #1863

Comments

jbarotin commented May 9, 2017 • edited

ericvergnaud commented May 9, 2017 via email

jbarotin commented May 9, 2017 • edited

ericvergnaud commented May 9, 2017

jbarotin commented May 9, 2017

jbarotin commented May 10, 2017 • edited

ericvergnaud commented May 10, 2017 via email

KvanTTT commented May 10, 2017

KvanTTT commented Jan 10, 2022 • edited

parrt commented Mar 26, 2022

jbarotin commented May 9, 2017 •

edited

jbarotin commented May 9, 2017 •

edited

jbarotin commented May 10, 2017 •

edited

KvanTTT commented Jan 10, 2022 •

edited