Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use signed ints for ATN serialization not uint16, except for java #3591

Merged
merged 46 commits into from Mar 26, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
b06fdc5
refactor serialize so we don't need comments
parrt Mar 18, 2022
68145b2
more cleanup during refactor
parrt Mar 18, 2022
7587346
store language in serializer obj
parrt Mar 18, 2022
a77e743
A lexer rule token type should never be -1 (EOF). 0 is fragment but t…
parrt Mar 18, 2022
9eb1f83
Go uses int not uint16 for ATN now. java/go/python3 pass
parrt Mar 18, 2022
b4148b8
remove checks for 0xFFFF in Go.
parrt Mar 18, 2022
4d2ebbf
C++ uint16_t to int for ATN.
parrt Mar 18, 2022
14ce0cd
add mac php dir; fix type on accept() for generated code to be mixed.
parrt Mar 18, 2022
9b6db61
Add test from @kvanTTT. This PR fixes https://github.com/antlr/antlr4…
parrt Mar 18, 2022
4bc9e38
cleanup and add big lexer from https://github.com/antlr/antlr4/pull/3546
parrt Mar 19, 2022
2cf2015
increase mvn mem size to 2G
parrt Mar 19, 2022
986f597
increase mvn mem size to 8G
parrt Mar 19, 2022
4087846
turn off the big ATN lexer test as we have memory issues during testing.
parrt Mar 19, 2022
82ef98b
Fixes #3592
parrt Mar 19, 2022
7b86e1f
Revert "C++ uint16_t to int for ATN."
parrt Mar 19, 2022
dd829bd
C++ uint16_t to int32_t for ATN.
parrt Mar 19, 2022
c82ace5
rm unnecessary include file, updating project file. get rid of the 0x…
parrt Mar 19, 2022
544baa5
rm refs to 0xFFFF in swift
parrt Mar 19, 2022
ac25b86
javascript tests were running as Node...added to ignore list.
parrt Mar 19, 2022
6643786
don't distinguish between 16 and 32 bit char sets in serialization; P…
parrt Mar 19, 2022
3f579ba
update C++ to deserialize only 32-bit sets
parrt Mar 19, 2022
d782c68
0xFFFF -> -1 for C++ target.
parrt Mar 19, 2022
3754457
Merge branch 'dev' into ATN-to-signed-ints
parrt Mar 19, 2022
a70605b
get other targets to use 32-bit sets in serialization. tests pass loc…
parrt Mar 19, 2022
f966345
refactor to reduce code size
parrt Mar 19, 2022
41896ef
add comment
parrt Mar 19, 2022
525198a
oops. comment out call to writeSerializedATNIntegerHistogram(). I won…
parrt Mar 19, 2022
e154984
all but Java, Node, PHP, Go work now for the huge lexer file; I have …
parrt Mar 19, 2022
46b5166
all but Java, Node, PHP, Go work now for the huge lexer file; I have …
parrt Mar 19, 2022
bef2f0a
Turn off this big lexer because we get memory errors during continuou…
parrt Mar 19, 2022
fcef8c9
Intermediate commit where I have shuffled around all of the -1 flippi…
parrt Mar 20, 2022
7f7cb03
convert decode to use int[]; remove dead code. don't use serializeAsC…
parrt Mar 20, 2022
fe06578
more tests passing. simplify. When copying atn, must run ATN through …
parrt Mar 20, 2022
985f5d9
0xFFFD+ are not valid char
parrt Mar 20, 2022
bb0d06a
clean up. tests passing now
parrt Mar 20, 2022
437f9b6
huge clean up. Got Java working with 32-bit ATNs!Still working on cle…
parrt Mar 21, 2022
6f30e75
Cleanup the hack I did earlier; everything still seems to work
parrt Mar 21, 2022
b0f8551
Use linux DCO not our old contributors certificate of origin
parrt Mar 22, 2022
a2cf73f
remove bump-by-2 code
parrt Mar 22, 2022
d7293dd
clean up per @kvanTTT. Can't test locally on this box. Will see what …
parrt Mar 25, 2022
40c035e
tweak comment
parrt Mar 25, 2022
1c66fae
Merge branch 'dev' of github.com:antlr/antlr4 into dev
parrt Mar 25, 2022
11ae041
Merge branch 'dev' into ATN-to-signed-ints
parrt Mar 25, 2022
db76a47
Revert "Use linux DCO not our old contributors certificate of origin"
parrt Mar 25, 2022
6b2b6a2
Merge branch 'dev' into ATN-to-signed-ints
parrt Mar 25, 2022
1aa5f18
see if C++ works in CI for huge ATN
parrt Mar 26, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-cpp.sh
Expand Up @@ -3,5 +3,6 @@
set -euo pipefail

pushd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=cpp.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-dart.sh
Expand Up @@ -6,5 +6,6 @@ dart --version

pushd runtime-testsuite
echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=dart.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-dotnet.sh
Expand Up @@ -3,5 +3,6 @@
set -euo pipefail

pushd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=csharp.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-go.sh
Expand Up @@ -6,5 +6,6 @@ go version

pushd runtime-testsuite
echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=go.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-javascript.sh
Expand Up @@ -15,6 +15,7 @@ popd
pushd runtime-testsuite

echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -Dtest=javascript.** test
RESULT+=$?

Expand Down
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-php.sh
Expand Up @@ -7,5 +7,6 @@ php -v
php_path=$(which php)
pushd runtime-testsuite
echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -DPHP_PATH="${php_path}" -Dparallel=classes -DthreadCount=4 -Dtest=php.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-python2.sh
Expand Up @@ -17,5 +17,6 @@ python2 --version

pushd runtime-testsuite
echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=python2.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-python3.sh
Expand Up @@ -17,5 +17,6 @@ python3 --version

pushd runtime-testsuite
echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=python3.** test
popd
1 change: 1 addition & 0 deletions .circleci/scripts/run-tests-swift.sh
Expand Up @@ -17,5 +17,6 @@ set -euo pipefail

pushd runtime-testsuite
echo "running maven tests..."
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=swift.** test
popd
1 change: 1 addition & 0 deletions .github/scripts-macosx/run-tests-cpp.sh
Expand Up @@ -3,5 +3,6 @@
set -euo pipefail

pushd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=cpp.** test
popd
1 change: 1 addition & 0 deletions .github/scripts-macosx/run-tests-dotnet.sh
Expand Up @@ -13,5 +13,6 @@ dotnet build -c Release -f netstandard2.0 runtime/CSharp/Antlr4.csproj

# run tests
pushd runtime-testsuite/
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=4 -Dtest=csharp.** test
popd
1 change: 1 addition & 0 deletions .github/scripts-macosx/run-tests-swift.sh
Expand Up @@ -38,6 +38,7 @@ swift build --version
cd runtime-testsuite/
# mvn -e -Dparallel=classes -DthreadCount=4 -Dtest=swift.** test
# I don't know swift enough to make it parallel. revert to single threaded
export MAVEN_OPTS="-Xmx8g"
mvn -e -Dtest=swift.** test
rc=$?
cat target/surefire-reports/*.dumpstream || true
Expand Down
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-csharp.cmd
@@ -1,5 +1,6 @@
dotnet build runtime/CSharp/src/Antlr4.csproj -c Release
dotnet pack runtime/CSharp/src/Antlr4.csproj -c Release
cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dtest=csharp.** test
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-dart.cmd
@@ -1,5 +1,6 @@
C:\ProgramData\chocolatey\bin\choco.exe -y install dart-sdk

cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dtest=dart.** test -Dantlr-dart-dart="C:\tools\dart-sdk\bin\dart.exe" -Dantlr-dart-pub="C:\tools\dart-sdk\bin\pub.bat" -Dantlr-dart-dart2native="C:\tools\dart-sdk\bin\dart2native.bat"
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-go.cmd
@@ -1,3 +1,4 @@
cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dtest=go.** test
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-java.cmd
@@ -1,3 +1,4 @@
cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dtest=java.** test
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-javascript.cmd
@@ -1,3 +1,4 @@
cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dtest=javascript.** test
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-php.cmd
Expand Up @@ -4,5 +4,6 @@ git clone https://github.com/antlr/antlr-php-runtime.git
move antlr-php-runtime runtime\PHP

cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dtest=php.** test -Dantlr-php-php="C:\tools\php81\php.exe"
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-python2.cmd
@@ -1,3 +1,4 @@
cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dantlr-python2-python="C:\Python27\python.exe" -Dtest=python2.** test
cd ..
1 change: 1 addition & 0 deletions .github/scripts-windows/run-tests-python3.cmd
@@ -1,3 +1,4 @@
cd runtime-testsuite
export MAVEN_OPTS="-Xmx8g"
mvn -Dparallel=classes -DthreadCount=2 -Dantlr-python3-python="C:\Python310\python.exe" -Dtest=python3.** test
cd ..
2 changes: 2 additions & 0 deletions runtime-testsuite/pom.xml
Expand Up @@ -132,6 +132,8 @@
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<release>8</release>
<source>9</source>
<target>9</target>
</configuration>
</plugin>
</plugins>
Expand Down
@@ -0,0 +1,16 @@
[type]
Lexer

[grammar]
lexer grammar L;
T_FFFF: 'FFFF' -> type(65535);

[input]
FFFF

[output]
[@0,0:3='FFFF',<65535>,1:0]
[@1,4:3='<EOF>',<-1>,1:4]

[skip]
Java
Expand Up @@ -104,7 +104,7 @@ PositionAdjustingLexer() ::= <<
func (p *PositionAdjustingLexer) NextToken() antlr.Token {
if _, ok := p.Interpreter.(*PositionAdjustingLexerATNSimulator); !ok {
lexerDeserializer := antlr.NewATNDeserializer(nil)
lexerAtn := lexerDeserializer.DeserializeFromUInt16(serializedLexerAtn)
lexerAtn := lexerDeserializer.Deserialize(serializedLexerAtn)
p.Interpreter = NewPositionAdjustingLexerATNSimulator(p, lexerAtn, p.Interpreter.DecisionToDFA(), p.Interpreter.SharedContextCache())
p.Virt = p
}
Expand Down
Expand Up @@ -366,9 +366,10 @@ public static RuntimeTestDescriptor[] getRuntimeTestDescriptors(String group, St
}

if (group.equals("LexerExec")) {
descriptors.add(GeneratedLexerDescriptors.getLineSeparatorLfTest(targetName));
descriptors.add(GeneratedLexerDescriptors.getLineSeparatorCrLfTest(targetName));
descriptors.add(GeneratedLexerDescriptors.getLineSeparatorLfDescriptor(targetName));
descriptors.add(GeneratedLexerDescriptors.getLineSeparatorCrLfDescriptor(targetName));
descriptors.add(GeneratedLexerDescriptors.getLargeLexerDescriptor(targetName));
descriptors.add(GeneratedLexerDescriptors.getAtnStatesSizeMoreThan65535Descriptor(targetName));
}

return descriptors.toArray(new RuntimeTestDescriptor[0]);
Expand Down
@@ -1,7 +1,9 @@
package org.antlr.v4.test.runtime;

import java.util.*;

public class GeneratedLexerDescriptors {
static RuntimeTestDescriptor getLineSeparatorLfTest(String targetName) {
static RuntimeTestDescriptor getLineSeparatorLfDescriptor(String targetName) {
UniversalRuntimeTestDescriptor result = new UniversalRuntimeTestDescriptor();
result.name = "LineSeparatorLf";
result.targetName = targetName;
Expand All @@ -20,7 +22,7 @@ static RuntimeTestDescriptor getLineSeparatorLfTest(String targetName) {
return result;
}

static RuntimeTestDescriptor getLineSeparatorCrLfTest(String targetName) {
static RuntimeTestDescriptor getLineSeparatorCrLfDescriptor(String targetName) {
UniversalRuntimeTestDescriptor result = new UniversalRuntimeTestDescriptor();
result.name = "LineSeparatorCrLf";
result.targetName = targetName;
Expand Down Expand Up @@ -65,4 +67,57 @@ static RuntimeTestDescriptor getLargeLexerDescriptor(String targetName) {
"[@1,5:4='<EOF>',<-1>,1:5]\n";
return result;
}

static RuntimeTestDescriptor getAtnStatesSizeMoreThan65535Descriptor(String targetName) {
UniversalRuntimeTestDescriptor result = new UniversalRuntimeTestDescriptor();
result.name = "AtnStatesSizeMoreThan65535";
result.notes = "Regression for https://github.com/antlr/antlr4/issues/1863";
result.targetName = targetName;
result.testType = "Lexer";

final int tokensCount = 1024;
final String suffix = String.join("", Collections.nCopies(70, "_"));

String grammarName = "L";
StringBuilder grammar = new StringBuilder();
grammar.append("lexer grammar ").append(grammarName).append(";\n");
grammar.append('\n');
StringBuilder input = new StringBuilder();
StringBuilder output = new StringBuilder();
int startOffset;
int stopOffset = -2;
for (int i = 0; i < tokensCount; i++) {
String ruleName = String.format("T_%06d", i);
String value = ruleName+suffix;
grammar.append(ruleName).append(": '").append(value).append("';\n");
input.append(value).append('\n');

startOffset = stopOffset + 2;
stopOffset += value.length() + 1;

output.append("[@").append(i).append(',').append(startOffset).append(':').append(stopOffset)
.append("='").append(value).append("',<").append(i + 1).append(">,").append(i + 1)
.append(":0]\n");
}

grammar.append("\n");
grammar.append("WS: [ \\t\\r\\n]+ -> skip;\n");

startOffset = stopOffset + 2;
stopOffset = startOffset - 1;
output.append("[@").append(tokensCount).append(',').append(startOffset).append(':').append(stopOffset)
.append("='<EOF>',<-1>,").append(tokensCount + 1).append(":0]\n");

result.grammar = grammar.toString();
result.grammarName = grammarName;
result.input = input.toString();
result.output = output.toString();

List<String> all = Arrays.asList("CSharp", "Python2", "Python3", "Cpp", "Go", "PHP", "Swift", "Java", "JavaScript", "Dart");
result.skipTargets.addAll(all);
// result.skipTargets.add("Java"); // can't handle > 16bit states yet
// result.skipTargets.add("JavaScript"); // doesn't terminate
// result.skipTargets.add("Go"); // syntax error
return result;
}
}
Expand Up @@ -55,7 +55,7 @@
*
* Sample output on OS X with 4 GHz Intel Core i7 (us == microseconds, 1/1000 of a millisecond):
*
Java VM args: -Xms2G -Xmx2G
Java VM args: -Xms2G -Xmx8g
Warming up Java compiler....
load_legacy_java_ascii_file average time 53us size 58384b over 3500 loads of 29038 symbols from Parser.java
load_legacy_java_ascii_file average time 27us size 15568b over 3500 loads of 7625 symbols from RuleContext.java
Expand Down
Expand Up @@ -244,7 +244,7 @@ private String locateTool(String tool) {
return phpPath;
}

String[] roots = {"/usr/local/bin/", "/opt/local/bin", "/usr/bin/"};
String[] roots = {"/usr/local/bin/", "/opt/local/bin", "/opt/homebrew/bin/", "/usr/bin/"};

for (String root: roots) {
if (new File(root + tool).exists()) {
Expand Down
11 changes: 0 additions & 11 deletions runtime/CSharp/src/Atn/ATNDeserializer.cs
Expand Up @@ -190,15 +190,7 @@ protected internal virtual void ReadLexerActions(ATN atn)
{
LexerActionType actionType = (LexerActionType)ReadInt();
int data1 = ReadInt();
if (data1 == unchecked((int)(0xFFFF)))
{
data1 = -1;
}
int data2 = ReadInt();
if (data2 == unchecked((int)(0xFFFF)))
{
data2 = -1;
}
ILexerAction lexerAction = LexerActionFactory(actionType, data1, data2);
atn.lexerActions[i_10] = lexerAction;
}
Expand Down Expand Up @@ -369,9 +361,6 @@ protected internal virtual void ReadRules(ATN atn)
atn.ruleToStartState[i_5] = startState;
if (atn.grammarType == ATNType.Lexer) {
int tokenType = ReadInt ();
if (tokenType == unchecked((int)(0xFFFF))) {
tokenType = TokenConstants.EOF;
}
atn.ruleToTokenType [i_5] = tokenType;
}
}
Expand Down
4 changes: 2 additions & 2 deletions runtime/Cpp/runtime/src/Parser.cpp
Expand Up @@ -38,7 +38,7 @@ struct BypassAltsAtnCache final {
/// bypass alternatives.
///
/// <seealso cref= ATNDeserializationOptions#isGenerateRuleBypassTransitions() </seealso>
std::map<std::vector<uint16_t>, std::unique_ptr<const atn::ATN>> map;
std::map<std::vector<int>, std::unique_ptr<const atn::ATN>> map;
parrt marked this conversation as resolved.
Show resolved Hide resolved
};

BypassAltsAtnCache* getBypassAltsAtnCache() {
Expand Down Expand Up @@ -229,7 +229,7 @@ TokenFactory<CommonToken>* Parser::getTokenFactory() {


const atn::ATN& Parser::getATNWithBypassAlts() {
const std::vector<uint16_t> &serializedAtn = getSerializedATN();
const std::vector<int> &serializedAtn = getSerializedATN();
if (serializedAtn.empty()) {
throw UnsupportedOperationException("The current parser does not support an ATN with bypass alternatives.");
}
Expand Down
2 changes: 1 addition & 1 deletion runtime/Cpp/runtime/src/Recognizer.h
Expand Up @@ -53,7 +53,7 @@ namespace antlr4 {
/// For interpreters, we don't know their serialized ATN despite having
/// created the interpreter from it.
/// </summary>
virtual const std::vector<uint16_t>& getSerializedATN() const {
virtual const std::vector<int>& getSerializedATN() const {
throw "there is no serialized ATN";
}

Expand Down