Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting this error: Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element out of range #840

Closed
agiovacchini opened this issue Mar 19, 2015 · 25 comments

Comments

@agiovacchini
Copy link

I'm using last version pulled out from git (4.5.1) and built under OSx Yosemite, tryed both with jdk6 and jdk7 but with my file, which is: https://gist.github.com/agiovacchini/52cc6dc9210a65dfc659#file-example-g4 I get this error:

Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element out of range
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:384)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:563)
at org.antlr.v4.codegen.model.SerializedATN.(SerializedATN.java:46)
at org.antlr.v4.codegen.model.Recognizer.(Recognizer.java:87)
at org.antlr.v4.codegen.model.Lexer.(Lexer.java:51)
at org.antlr.v4.codegen.OutputModelController.lexer(OutputModelController.java:176)
at org.antlr.v4.codegen.OutputModelController.buildLexerOutputModel(OutputModelController.java:129)
at org.antlr.v4.codegen.CodeGenerator.generateLexer(CodeGenerator.java:142)
at org.antlr.v4.codegen.CodeGenPipeline.process(CodeGenPipeline.java:73)
at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:428)
at org.antlr.v4.Tool.process(Tool.java:378)
at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:345)
at org.antlr.v4.Tool.main(Tool.java:192)

The launch script is like this:

!/bin/zsh

export CLASSPATH=".:./antlr4/dist/antlr4-4.5.1-complete.jar"
alias antlr4='java -jar ./antlr4/dist/antlr4-4.5.1-complete.jar'
alias grun='java org.antlr.v4.runtime.misc.TestRig'
antlr4 example.g4

@parrt
Copy link
Member

parrt commented Mar 19, 2015

likely classpath issue. remove any prior from class path
On Mar 19, 2015, at 4:11 PM, agiovacchini notifications@github.com wrote:

I'm using last version pulled out from git (4.5.1) and built under OSx Yosemite, tryed both with jdk6 and jdk7 but with my file, which is: https://gist.github.com/agiovacchini/52cc6dc9210a65dfc659#file-example-g4 I get this error:

Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element out of range
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:384)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:563)
at org.antlr.v4.codegen.model.SerializedATN.(SerializedATN.java:46)
at org.antlr.v4.codegen.model.Recognizer.(Recognizer.java:87)
at org.antlr.v4.codegen.model.Lexer.(Lexer.java:51)
at org.antlr.v4.codegen.OutputModelController.lexer(OutputModelController.java:176)
at org.antlr.v4.codegen.OutputModelController.buildLexerOutputModel(OutputModelController.java:129)
at org.antlr.v4.codegen.CodeGenerator.generateLexer(CodeGenerator.java:142)
at org.antlr.v4.codegen.CodeGenPipeline.process(CodeGenPipeline.java:73)
at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:428)
at org.antlr.v4.Tool.process(Tool.java:378)
at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:345)
at org.antlr.v4.Tool.main(Tool.java:192)


Reply to this email directly or view it on GitHub.

@agiovacchini
Copy link
Author

I've already tryed with this, but same result:

#!/bin/zsh
export CLASSPATH="./antlr4/dist/antlr4-4.5.1-complete.jar"
alias antlr4='/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -jar ./antlr4/dist/antlr4-4.5.1-complete.jar'
alias grun='/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java org.antlr.v4.runtime.misc.TestRig'

And I'm sure I'm using the right antlr jar, if I add my debugs and rebuild the jar I see them appearing:

nsets: 29
atn.lexerActions.length: 0
data.size: 743398
<<i:1 ><data[i]:1070>>
<<i:2 ><data[i]:54991>>
<<i:3 ><data[i]:33284>>
<<i:4 ><data[i]:44331>>
<<i:5 ><data[i]:17429>>
<<i:6 ><data[i]:44783>>
<<i:7 ><data[i]:36222>>
<<i:8 ><data[i]:43739>>
<<i:9 ><data[i]:0>>
<<i:10 ><data[i]:4408>>
<<i:11 ><data[i]:91799>>
Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element out of range
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:384)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:563)
at org.antlr.v4.codegen.model.SerializedATN.(SerializedATN.java:46)
at org.antlr.v4.codegen.model.Recognizer.(Recognizer.java:87)
at org.antlr.v4.codegen.model.Lexer.(Lexer.java:51)
at org.antlr.v4.codegen.OutputModelController.lexer(OutputModelController.java:176)
at org.antlr.v4.codegen.OutputModelController.buildLexerOutputModel(OutputModelController.java:129)
at org.antlr.v4.codegen.CodeGenerator.generateLexer(CodeGenerator.java:142)
at org.antlr.v4.codegen.CodeGenPipeline.process(CodeGenPipeline.java:73)
at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:428)
at org.antlr.v4.Tool.process(Tool.java:378)
at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:345)
at org.antlr.v4.Tool.main(Tool.java:192)

These are the debugs I've added to ATNSerializer.java:

    ...
    //
    // LEXER ACTIONS
    //
    if (atn.grammarType == ATNType.LEXER) {
        data.add(atn.lexerActions.length);
        System.out.println("atn.lexerActions.length: " + atn.lexerActions.length);
        for (LexerAction action : atn.lexerActions) {
            data.add(action.getActionType().ordinal());
            System.out.println("action.getActionType().ordinal(): " + action.getActionType().ordinal());
            System.out.println("action.getActionType(): " + action.getActionType());
            switch (action.getActionType()) {
            case CHANNEL:
                int channel = ((LexerChannelAction)action).getChannel();
                data.add(channel != -1 ? channel : 0xFFFF);

                System.out.println("channel != -1 ? channel : 0xFFFF: " + (channel != -1 ? channel : 0xFFFF));
                data.add(0);
                break;

            case CUSTOM:
                int ruleIndex = ((LexerCustomAction)action).getRuleIndex();
                int actionIndex = ((LexerCustomAction)action).getActionIndex();
                data.add(ruleIndex != -1 ? ruleIndex : 0xFFFF);
                System.out.println("ruleIndex != -1 ? ruleIndex : 0xFFFF: " + (ruleIndex != -1 ? ruleIndex : 0xFFFF));
                data.add(actionIndex != -1 ? actionIndex : 0xFFFF);
                System.out.println("actionIndex != -1 ? actionIndex : 0xFFFF: " + (actionIndex != -1 ? actionIndex : 0xFFFF));
                break;

            case MODE:
                int mode = ((LexerModeAction)action).getMode();
                data.add(mode != -1 ? mode : 0xFFFF);
                System.out.println("mode != -1 ? mode : 0xFFFF: " + (mode != -1 ? mode : 0xFFFF));
                data.add(0);
                break;

            case MORE:
                data.add(0);
                data.add(0);
                break;

            case POP_MODE:
                data.add(0);
                data.add(0);
                break;

            case PUSH_MODE:
                mode = ((LexerPushModeAction)action).getMode();
                data.add(mode != -1 ? mode : 0xFFFF);
                System.out.println("mode != -1 ? mode : 0xFFFF: " + (mode != -1 ? mode : 0xFFFF));
                data.add(0);
                break;

            case SKIP:
                data.add(0);
                data.add(0);
                break;

            case TYPE:
                int type = ((LexerTypeAction)action).getType();
                data.add(type != -1 ? type : 0xFFFF);
                System.out.println("type != -1 ? type : 0xFFFF: " + (type != -1 ? type : 0xFFFF));
                data.add(0);
                break;

            default:
                String message = String.format(Locale.getDefault(), "The specified lexer action type %s is not valid.", action.getActionType());
                throw new IllegalArgumentException(message);
            }
        }
    }
    System.out.println("data.size: " + data.size());
    // don't adjust the first value since that's the version number
    for (int i = 1; i < data.size(); i++) {

        System.out.println("<<i:" + i + " ><data[i]:" + data.get(i) + ">>");
        if (data.get(i) < Character.MIN_VALUE || data.get(i) > Character.MAX_VALUE) {
            throw new UnsupportedOperationException("Serialized ATN data element out of range");
            //System.out.println("Serialized ATN data element out of range");
        }


        int value = (data.get(i) + 2) & 0xFFFF;
        data.set(i, value);
    }
    ...

I've seen #76 that seem very similar to my problem but is now fixed; the file attacched for issue 76 (https://gist.github.com/sharwell/3955844#file-example-g4) succeeds in my env; so seems like really I don't have any classpath issue.

Can you please try the file attached to this issue (https://gist.github.com/agiovacchini/52cc6dc9210a65dfc659#file-example-g4) in your env and see if it succeeds? If I make it smaller by cutting part of the "ingr:" entryes it succeeds in my env but not with the full list, maybe I've hitten some other limitation in data size in JDK/ANTLR?

@parrt
Copy link
Member

parrt commented Mar 20, 2015

have you completely regenerated all your grammars? are you sure 4.5.1 is built correctly?

@agiovacchini
Copy link
Author

Yes, I have regenerated everything, before running again I did this:
rm *.class
rm *.java
rm *.tokens

This is the content of bild.log:

[03/20/15 09:39:20 ./bild.py:42 bilder.py:124] platform=darwin
[03/20/15 09:39:20 ./bild.py:42 bilder.py:125] jdk={'1.6': '/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home', '1.7': '/Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/Home', '1.8': '/Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home'}
[03/20/15 09:39:20] require compile
[03/20/15 09:39:20 require ./bild.py:87 bilder.py:422] require compile
[03/20/15 09:39:20] require parsers
[03/20/15 09:39:20 require ./bild.py:73 bilder.py:422] require parsers
[03/20/15 09:39:20] build compile
[03/20/15 09:39:20 require ./bild.py:73 bilder.py:430] build compile
[03/20/15 09:39:20] build mkjar_complete
[03/20/15 09:39:20 require ./bild.py:87 bilder.py:430] build mkjar_complete
[03/20/15 09:39:21 print_and_log ./bild.py:114 bilder.py:761] Generated dist/antlr4-4.5.1-complete.jar

But I've tryed building myself after using with the same error the prebuilded jar downloaded from the ANTLR site.

Does my grammar work in your environment?

@parrt
Copy link
Member

parrt commented Mar 20, 2015

with 4.5 i get error and with 4.5.1. weird!

terence:~/tmp $ java -jar ~/antlr/code/antlr4/dist/antlr4-4.5.1-complete.jar example.g4 
Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element out of range.
    at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:370)

I'll check at work.

@sharwell
Copy link
Member

The ATN for the grammar (most likely the number of states and/or the number of transitions) is larger than the current representation is capable of encoding.

@agiovacchini
Copy link
Author

Thank you for support. Is there some kind of workaround to this problem?

@sharwell
Copy link
Member

@agiovacchini There's not that I know of right now. The serialization logic could be rewritten to use compressed integers like the ones used in ECMA-335 (bytecode for .NET), but it wouldn't be a small undertaking. It's arguably a good idea in the long run though.

@agiovacchini
Copy link
Author

Ok, so the only option is to have a smaller grammar, right?

@joson
Copy link

joson commented Mar 25, 2015

I've fixed the same issue caused such a grammer in my .g4 file:
fragment
YearDigit
: '1900'..'3000'
;

I've modified it as below and then compiled successfully:
fragment
YearDigit
: '19' ('0'..'9') ('0'..'9')
| ('2'|'3') ('0'..'9') ('0'..'9') ('0'..'9')
;

@agiovacchini
Copy link
Author

In my case I have many different literal strings to use as key in grammar but maybe I've found a workaround done this way:

  1. Make a list of all possible keys and assign them an id
  2. Encode the id in some base (eg: base64) to reduce the chars used by the id; take care the base has no digits used in grammar
  3. Use the encoded ids in the .g4 grammar
  4. Preprocess the data to be interpreted with the grammar by replacing the string literals with the base encoded ids
  5. Feed the grammar with the file

It takes a bit of work to start using but then allows for many more possible keys given the less space used by one.

@parrt
Copy link
Member

parrt commented May 19, 2015

Hiya. I guess I'll close and I'll likely not fix this. thanks for documenting though!!

@ftomassetti
Copy link

I had the same issue with the grammar of language I was writing. One thing you can do to avoid this is to have one token type for several operators with the same precedence (e.g., relational operators) instead of separate tokens. It seems to help.

It would be nice if this was fixed eventually...

@adsmith
Copy link

adsmith commented Feb 16, 2016

I am seeing this error, I guess it still exists.

mpPath = C:\Users\ADAM~1.SMI\AppData\Local\Temp/jhades

eleting temporary directory
nziping WAR
xception in thread "main" java.lang.UnsupportedOperationException
at com.sun.nio.zipfs.ZipFileSystemProvider.ensureFile(ZipFileSystemProvider.java:96)
at com.sun.nio.zipfs.ZipFileSystemProvider.newFileSystem(ZipFileSystemProvider.java:110)
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:326)
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:276)
at org.jhades.utils.ZipUtils.createZipFileSystem(ZipUtils.java:110)
at org.jhades.utils.ZipUtils.unzip(ZipUtils.java:57)
at org.jhades.standalone.JHadesStandaloneReport.scan(JHadesStandaloneReport.java:89)
at org.jhades.standalone.JHadesStandaloneReport.main(JHadesStandaloneReport.java:78)

@parrt
Copy link
Member

parrt commented Feb 16, 2016

yep, size limitation. try to simplify your lexer or parser.

@cy6erGn0m
Copy link

shoudn't antlr produce some kind user-friendly message in this case?

@parrt
Copy link
Member

parrt commented Apr 15, 2016

yup.

@DocPlaid
Copy link

I had a similar problem, so I began commenting out lines in my grammar until the error went away.
In my case, the problem was a typo in my rules for numeric literals:
fragment DIGIT : '0'..'9 ' ;
Changing that to either of the following made the error go away:
fragment DIGIT : '0'..'9' ;
fragment DIGIT : [0123456789] ;
It seems likely that the character range from '0' to '9 ' is interpreted as Unicode values, and it produces an unexpectedly large range. This is consistent with the discussion above.

@sharwell
Copy link
Member

@DocPlaid you can also use [0-9] for that range in ANTLR 4. 😄

@mullekay
Copy link

mullekay commented Feb 9, 2020

Sorry to raise this question again! I am trying to build a large dictionary of about 200,000 words (which use simple regular expressions to cover for edge cases. Unfortunately, I am running into the mentioned issue as well. From what I understand, this is due to the reason that the number of internal ATN states is limited to MAX_VALUE = '\uFFFF'. Is there a recommended way of using the ANTLR lexer as a tagger for large dictionaries. Furthermore, I should mention that the code is generating the grammar on the fly.

`// create lexer
LexerGrammar lg = new LexerGrammar(grammarString);
// create interpreter
ByteArrayInputStream stream = new ByteArrayInputStream("test input".getBytes());
// exception happens here
LexerInterpreter lexEngine = lg .createLexerInterpreter(CharStreams.fromStream(stream));

CommonTokenStream tokens = new CommonTokenStream(lexEngine);

// parse text
tokens.fill();
for (Token token : tokens.getTokens()) {
// go through tokens
}`

This is an example of how the rules look like (but with a lot more elements)

`lexer grammar TestLexer;
LETTER_A : ('a'|'A');
LETTER_B : ('b'|'B');
LETTER_C : ('c'|'C');
LETTER_D : ('d'|'D');
LETTER_E : ('e'|'E');
LETTER_F : ('f'|'F');
LETTER_G : ('g'|'G');
LETTER_H : ('h'|'H');
LETTER_I : ('i'|'I');
LETTER_J : ('j'|'J');
LETTER_K : ('k'|'K');
LETTER_L : ('l'|'L');
LETTER_M : ('m'|'M');
LETTER_N : ('n'|'N');
LETTER_O : ('o'|'O');
LETTER_P : ('p'|'P');
LETTER_Q : ('q'|'Q');
LETTER_R : ('r'|'R');
LETTER_S : ('s'|'S');
LETTER_T : ('t'|'T');
LETTER_U : ('u'|'U');
LETTER_V : ('v'|'V');
LETTER_W : ('w'|'W');
LETTER_X : ('x'|'X');
LETTER_Y : ('y'|'Y');
LETTER_Z : ('z'|'Z');
WS_TOKEN : [ \t\n\r];
FILL_TOKEN : (
WS_TOKEN
|'-'
|','
|';'
|'_'
|'|'
|':'
|'#'
|'.'
|'<'
|'>'
|'('
|')'
|'{'
|'}'
|'['
|']'
|'/'
|'''
|'"'
);

T_0 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_F FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1000' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C ;
T_1 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1990' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_D ;
T_2 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1100' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_D ;

SKIP_RULE : . -> skip ;`

Right now, I am generating N Lexer grammars and execute them one by one (in threads), but this takes several seconds to complete. Was hoping for a solution, which can process the input text within miliseconds. Is there anything I am missing? Thank you very much in advance!

@parrt
Copy link
Member

parrt commented Feb 14, 2020

unfortunately, Java can only do strings for static data; no way to make a static array of ints that just becomes available as a global or field. So, we had to encode in strings, using 16bit chars as ints :(

@lidanxu
Copy link

lidanxu commented Mar 4, 2021

Sorry to raise this question again! I am trying to build a large dictionary of about 200,000 words (which use simple regular expressions to cover for edge cases. Unfortunately, I am running into the mentioned issue as well. From what I understand, this is due to the reason that the number of internal ATN states is limited to MAX_VALUE = '\uFFFF'. Is there a recommended way of using the ANTLR lexer as a tagger for large dictionaries. Furthermore, I should mention that the code is generating the grammar on the fly.

`// create lexer
LexerGrammar lg = new LexerGrammar(grammarString);
// create interpreter
ByteArrayInputStream stream = new ByteArrayInputStream("test input".getBytes());
// exception happens here
LexerInterpreter lexEngine = lg .createLexerInterpreter(CharStreams.fromStream(stream));

CommonTokenStream tokens = new CommonTokenStream(lexEngine);

// parse text
tokens.fill();
for (Token token : tokens.getTokens()) {
// go through tokens
}`

This is an example of how the rules look like (but with a lot more elements)

`lexer grammar TestLexer;
LETTER_A : ('a'|'A');
LETTER_B : ('b'|'B');
LETTER_C : ('c'|'C');
LETTER_D : ('d'|'D');
LETTER_E : ('e'|'E');
LETTER_F : ('f'|'F');
LETTER_G : ('g'|'G');
LETTER_H : ('h'|'H');
LETTER_I : ('i'|'I');
LETTER_J : ('j'|'J');
LETTER_K : ('k'|'K');
LETTER_L : ('l'|'L');
LETTER_M : ('m'|'M');
LETTER_N : ('n'|'N');
LETTER_O : ('o'|'O');
LETTER_P : ('p'|'P');
LETTER_Q : ('q'|'Q');
LETTER_R : ('r'|'R');
LETTER_S : ('s'|'S');
LETTER_T : ('t'|'T');
LETTER_U : ('u'|'U');
LETTER_V : ('v'|'V');
LETTER_W : ('w'|'W');
LETTER_X : ('x'|'X');
LETTER_Y : ('y'|'Y');
LETTER_Z : ('z'|'Z');
WS_TOKEN : [ \t\n\r];
FILL_TOKEN : (
WS_TOKEN
|'-'
|','
|';'
|'_'
|'|'
|':'
|'#'
|'.'
|'<'
|'>'
|'('
|')'
|'{'
|'}'
|'['
|']'
|'/'
|'''
|'"'
);

T_0 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_F FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1000' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C ;
T_1 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1990' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_D ;
T_2 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1100' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_D ;

SKIP_RULE : . -> skip ;`

Right now, I am generating N Lexer grammars and execute them one by one (in threads), but this takes several seconds to complete. Was hoping for a solution, which can process the input text within miliseconds. Is there anything I am missing? Thank you very much in advance!

Sorry to raise this question again! I am trying to build a large dictionary of about 200,000 words (which use simple regular expressions to cover for edge cases. Unfortunately, I am running into the mentioned issue as well. From what I understand, this is due to the reason that the number of internal ATN states is limited to MAX_VALUE = '\uFFFF'. Is there a recommended way of using the ANTLR lexer as a tagger for large dictionaries. Furthermore, I should mention that the code is generating the grammar on the fly.

`// create lexer
LexerGrammar lg = new LexerGrammar(grammarString);
// create interpreter
ByteArrayInputStream stream = new ByteArrayInputStream("test input".getBytes());
// exception happens here
LexerInterpreter lexEngine = lg .createLexerInterpreter(CharStreams.fromStream(stream));

CommonTokenStream tokens = new CommonTokenStream(lexEngine);

// parse text
tokens.fill();
for (Token token : tokens.getTokens()) {
// go through tokens
}`

This is an example of how the rules look like (but with a lot more elements)

`lexer grammar TestLexer;
LETTER_A : ('a'|'A');
LETTER_B : ('b'|'B');
LETTER_C : ('c'|'C');
LETTER_D : ('d'|'D');
LETTER_E : ('e'|'E');
LETTER_F : ('f'|'F');
LETTER_G : ('g'|'G');
LETTER_H : ('h'|'H');
LETTER_I : ('i'|'I');
LETTER_J : ('j'|'J');
LETTER_K : ('k'|'K');
LETTER_L : ('l'|'L');
LETTER_M : ('m'|'M');
LETTER_N : ('n'|'N');
LETTER_O : ('o'|'O');
LETTER_P : ('p'|'P');
LETTER_Q : ('q'|'Q');
LETTER_R : ('r'|'R');
LETTER_S : ('s'|'S');
LETTER_T : ('t'|'T');
LETTER_U : ('u'|'U');
LETTER_V : ('v'|'V');
LETTER_W : ('w'|'W');
LETTER_X : ('x'|'X');
LETTER_Y : ('y'|'Y');
LETTER_Z : ('z'|'Z');
WS_TOKEN : [ \t\n\r];
FILL_TOKEN : (
WS_TOKEN
|'-'
|','
|';'
|'_'
|'|'
|':'
|'#'
|'.'
|'<'
|'>'
|'('
|')'
|'{'
|'}'
|'['
|']'
|'/'
|'''
|'"'
);

T_0 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_F FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1000' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C ;
T_1 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1990' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_D ;
T_2 : LETTER_A FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_B FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_C FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? '1100' FILL_TOKEN? FILL_TOKEN? FILL_TOKEN? LETTER_D ;

SKIP_RULE : . -> skip ;`

Right now, I am generating N Lexer grammars and execute them one by one (in threads), but this takes several seconds to complete. Was hoping for a solution, which can process the input text within miliseconds. Is there anything I am missing? Thank you very much in advance!

@mullekay Excuse me, I need to build a large dictionary of about 200,000+ words and get this error: Exception in thread "main" java.lang.UnsupportedOperationException: Serialized ATN data element out of range , did you find any solution?I'll appreciate it very much if you lend me a hand.

@KvanTTT
Copy link
Member

KvanTTT commented Jan 16, 2022

@lidanxu @mullekay out of curiosity, for what do you need such a big large dictionary? I'm working on the problem (see linked PR).

KvanTTT added a commit to KvanTTT/antlr4 that referenced this issue Feb 20, 2022
@parrt
Copy link
Member

parrt commented Mar 26, 2022

Reopening as we are now going to allow very large ATN.

@parrt
Copy link
Member

parrt commented Mar 26, 2022

Fixed by #3591

@parrt parrt closed this as completed Mar 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests