New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse exponent literal as number #768
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -541,6 +541,7 @@ impl<'a> Tokenizer<'a> { | |
chars.next(); // consume the first char | ||
let s = self.tokenize_word(ch, chars); | ||
|
||
// TODO: implement parsing of exponent here | ||
if s.chars().all(|x| ('0'..='9').contains(&x) || x == '.') { | ||
let mut inner_state = State { | ||
peekable: s.chars().peekable(), | ||
|
@@ -617,6 +618,36 @@ impl<'a> Tokenizer<'a> { | |
return Ok(Some(Token::Period)); | ||
} | ||
|
||
// Parse exponent as number | ||
if chars.peek() == Some(&'e') || chars.peek() == Some(&'E') { | ||
let mut char_clone = chars.peekable.clone(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this copy needed? Given There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I needed a way to peek more than just the next char, since is only valid exponent if |
||
let mut exponent_part = String::new(); | ||
exponent_part.push(char_clone.next().unwrap()); | ||
|
||
// Optional sign | ||
match char_clone.peek() { | ||
Some(&c) if matches!(c, '+' | '-') => { | ||
exponent_part.push(c); | ||
char_clone.next(); | ||
} | ||
_ => (), | ||
} | ||
|
||
match char_clone.peek() { | ||
// Definitely an exponent, get original iterator up to speed and use it | ||
Some(&c) if matches!(c, '0'..='9') => { | ||
for _ in 0..exponent_part.len() { | ||
chars.next(); | ||
} | ||
exponent_part += | ||
&peeking_take_while(chars, |ch| matches!(ch, '0'..='9')); | ||
s += exponent_part.as_str(); | ||
} | ||
// Not an exponent, discard the work done | ||
_ => (), | ||
} | ||
} | ||
|
||
let long = if chars.peek() == Some(&'L') { | ||
chars.next(); | ||
true | ||
|
@@ -1091,6 +1122,41 @@ mod tests { | |
compare(expected, tokens); | ||
} | ||
|
||
#[test] | ||
fn tokenize_select_exponent() { | ||
let sql = String::from("SELECT 1e10, 1e-10, 1e+10, 1ea, 1e-10a, 1e-10-10"); | ||
let dialect = GenericDialect {}; | ||
let mut tokenizer = Tokenizer::new(&dialect, &sql); | ||
let tokens = tokenizer.tokenize().unwrap(); | ||
|
||
let expected = vec![ | ||
Token::make_keyword("SELECT"), | ||
Token::Whitespace(Whitespace::Space), | ||
Token::Number(String::from("1e10"), false), | ||
Token::Comma, | ||
Token::Whitespace(Whitespace::Space), | ||
Token::Number(String::from("1e-10"), false), | ||
Token::Comma, | ||
Token::Whitespace(Whitespace::Space), | ||
Token::Number(String::from("1e+10"), false), | ||
Token::Comma, | ||
Token::Whitespace(Whitespace::Space), | ||
Token::Number(String::from("1"), false), | ||
Token::make_word("ea", None), | ||
Token::Comma, | ||
Token::Whitespace(Whitespace::Space), | ||
Token::Number(String::from("1e-10"), false), | ||
Token::make_word("a", None), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found this very strange that a new token is formed without whitespace after a number. I expected that this is a token error but this implementation agrees with postgres 🤯 postgres=# select 12e-10a;
a
--------------
0.0000000012
(1 row)
postgres=# select 12e-10 a;
a
--------------
0.0000000012
(1 row) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Likewise postgres=# select 1e-10-10;
?column?
---------------
-9.9999999999
(1 row)
postgres=# select 1e-10 -10;
?column?
---------------
-9.9999999999
(1 row) 🤯 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this behaviour is part of what bit me when trying to implement for Hive dialect 😅 |
||
Token::Comma, | ||
Token::Whitespace(Whitespace::Space), | ||
Token::Number(String::from("1e-10"), false), | ||
Token::Minus, | ||
Token::Number(String::from("10"), false), | ||
]; | ||
|
||
compare(expected, tokens); | ||
} | ||
|
||
#[test] | ||
fn tokenize_scalar_function() { | ||
let sql = String::from("SELECT sqrt(1)"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍