Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a more understandable message than "Serialized ATN data element .... element ... out of range 0..65535" #1863

Closed
jbarotin opened this issue May 9, 2017 · 9 comments

Comments

@jbarotin
Copy link

jbarotin commented May 9, 2017

Hi,

I tried to build a big grammar file with antlr4. I've got the following message when I execute the ANTLR code that generate java source :

Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element ```
101246 element 11 out of range 0..65535
        at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:361)
        at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
        at org.antlr.v4.codegen.model.SerializedATN.<init>(SerializedATN.java:22)
        at org.antlr.v4.codegen.model.Recognizer.<init>(Recognizer.java:64)
        at org.antlr.v4.codegen.model.Lexer.<init>(Lexer.java:27)
        at org.antlr.v4.codegen.OutputModelController.lexer(OutputModelController.java:151)
        at org.antlr.v4.codegen.OutputModelController.buildLexerOutputModel(OutputModelController.java:104)
        at org.antlr.v4.codegen.CodeGenerator.generateLexer(CodeGenerator.java:119)
        at org.antlr.v4.codegen.CodeGenPipeline.process(CodeGenPipeline.java:54)
        at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:404)
        at org.antlr.v4.Tool.process(Tool.java:354)
        at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:321)
        at org.antlr.v4.Tool.main(Tool.java:168)

From what I understand, my grammar file is too long.

Is that possible to get a message to explain that the limitation of the library is reached ?

For your test, here it is my grammar file ( a bit cryptic cause I anonymize it) :
https://gist.github.com/jbarotin/7f7caf419b9038c2a548969265788392

@ericvergnaud
Copy link
Contributor

ericvergnaud commented May 9, 2017 via email

@jbarotin
Copy link
Author

jbarotin commented May 9, 2017

Thanks for your answer, you are exactly describing the way I'm implementing my grammar right now.

But I loose time to understand why it's not working. I think improving the message (by intercepting the exception) will save time to ones unaware of that who wanted to build a big grammar file.

@ericvergnaud
Copy link
Contributor

To save time, maybe you should start by reading the book?
Hard to say with obfuscated keywords, but I bet most of your rules (such as 'end_event_value') could be coded using an ID or something similar instead of thousands of possible values.

@jbarotin
Copy link
Author

jbarotin commented May 9, 2017

Yes exactly, it's not a big deal 'end_event_value' must be equal to a strict list of value. I'll validate this dynamically thought the AST.

Right now, I have not enough time to read the book, I'm just building a small tools to automatize validation of expert rules... but I have already used the boost spirit library in the past.

@jbarotin
Copy link
Author

jbarotin commented May 10, 2017

Anyway, it was just a suggestion, if you think it's a good idea but you've got no time to do that, I can try to submit a pull request. Otherwise, you can close the issue.

@ericvergnaud
Copy link
Contributor

ericvergnaud commented May 10, 2017 via email

@KvanTTT
Copy link
Member

KvanTTT commented May 10, 2017

@ericvergnaud the question is not about bad-written grammar but about the informative error message. It's possible to get such error thus we have to handle it properly and show a correct message.

@KvanTTT
Copy link
Member

KvanTTT commented Jan 10, 2022

I understand how it could be fixed. Only one value exceeds the range 0..65535, it's the number of ATN states ATNSerializer.java#L85 (it has 11 index in data). It's possible to save the length in two words that allow up to 2^32 ATN states. This change may be back-compatible if use an algorithm with varying bytes count (if size in 0..32768 range then use 1 byte, overwise use 2 bytes for size). But actually, back-compatibility does not matter in this case because runtimes have version checks and generated files are not compatible with the tool of another version.

I think we should increase the size limit because there are similar user bug reports over the repository: #840, #2732, #3338. And 65535 is really not a big limit, some grammars require more states (for natural languages processing or something like that).

KvanTTT added a commit to KvanTTT/antlr4 that referenced this issue Jan 15, 2022
Refactor ATN serializer and deserializer, use ATNDataWriter, ATNDataReader

Remove excess data cloning in deserializer

fixes antlr#1863, fixes antlr#2732, fixes antlr#3338
KvanTTT added a commit to KvanTTT/antlr4 that referenced this issue Jan 16, 2022
Refactor ATN serializer and deserializer, use ATNDataWriter, ATNDataReader

Remove excess data cloning in deserializer

fixes antlr#1863, fixes antlr#2732, fixes antlr#3338
KvanTTT added a commit to KvanTTT/antlr4 that referenced this issue Jan 17, 2022
Refactor ATN serializer and deserializer, use ATNDataWriter, ATNDataReader

Remove excess data cloning in deserializer

fixes antlr#1863, fixes antlr#2732, fixes antlr#3338
KvanTTT added a commit to KvanTTT/antlr4 that referenced this issue Feb 20, 2022
@parrt parrt added this to the 4.10 milestone Mar 26, 2022
@parrt
Copy link
Member

parrt commented Mar 26, 2022

Fixed by #3591

@parrt parrt closed this as completed Mar 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment