Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: fix lazy DFA false quits on ASCII text #768

Merged
merged 1 commit into from May 1, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 9 additions & 0 deletions src/compile.rs
Expand Up @@ -318,6 +318,13 @@ impl Compiler {
}
self.compiled.has_unicode_word_boundary = true;
self.byte_classes.set_word_boundary();
// We also make sure that all ASCII bytes are in a different
// class from non-ASCII bytes. Otherwise, it's possible for
// ASCII bytes to get lumped into the same class as non-ASCII
// bytes. This in turn may cause the lazy DFA to falsely start
// when it sees an ASCII byte that maps to a byte class with
// non-ASCII bytes. This ensures that never happens.
self.byte_classes.set_range(0, 0x7F);
self.c_empty_look(prog::EmptyLook::WordBoundary)
}
WordBoundary(hir::WordBoundary::UnicodeNegate) => {
Expand All @@ -330,6 +337,8 @@ impl Compiler {
}
self.compiled.has_unicode_word_boundary = true;
self.byte_classes.set_word_boundary();
// See comments above for why we set the ASCII range here.
self.byte_classes.set_range(0, 0x7F);
self.c_empty_look(prog::EmptyLook::NotWordBoundary)
}
WordBoundary(hir::WordBoundary::Ascii) => {
Expand Down