Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse exponent literal as number #768

Merged
merged 1 commit into from Dec 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions src/test_utils.rs
Expand Up @@ -144,6 +144,7 @@ pub fn all_dialects() -> TestedDialects {
Box::new(RedshiftSqlDialect {}),
Box::new(MySqlDialect {}),
Box::new(BigQueryDialect {}),
Box::new(SQLiteDialect {}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

],
}
}
Expand Down
66 changes: 66 additions & 0 deletions src/tokenizer.rs
Expand Up @@ -541,6 +541,7 @@ impl<'a> Tokenizer<'a> {
chars.next(); // consume the first char
let s = self.tokenize_word(ch, chars);

// TODO: implement parsing of exponent here
if s.chars().all(|x| ('0'..='9').contains(&x) || x == '.') {
let mut inner_state = State {
peekable: s.chars().peekable(),
Expand Down Expand Up @@ -617,6 +618,36 @@ impl<'a> Tokenizer<'a> {
return Ok(Some(Token::Period));
}

// Parse exponent as number
if chars.peek() == Some(&'e') || chars.peek() == Some(&'E') {
let mut char_clone = chars.peekable.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this copy needed?

Given chars is already peekable I don't see why it can't be used directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed a way to peek more than just the next char, since is only valid exponent if e followed by optional sign and an actual number. Found easiest way was to simply clone the iter and use that, and if found not to be an exponent and safely discard it and continue regular behaviour with original iter.

let mut exponent_part = String::new();
exponent_part.push(char_clone.next().unwrap());

// Optional sign
match char_clone.peek() {
Some(&c) if matches!(c, '+' | '-') => {
exponent_part.push(c);
char_clone.next();
}
_ => (),
}

match char_clone.peek() {
// Definitely an exponent, get original iterator up to speed and use it
Some(&c) if matches!(c, '0'..='9') => {
for _ in 0..exponent_part.len() {
chars.next();
}
exponent_part +=
&peeking_take_while(chars, |ch| matches!(ch, '0'..='9'));
s += exponent_part.as_str();
}
// Not an exponent, discard the work done
_ => (),
}
}

let long = if chars.peek() == Some(&'L') {
chars.next();
true
Expand Down Expand Up @@ -1091,6 +1122,41 @@ mod tests {
compare(expected, tokens);
}

#[test]
fn tokenize_select_exponent() {
let sql = String::from("SELECT 1e10, 1e-10, 1e+10, 1ea, 1e-10a, 1e-10-10");
let dialect = GenericDialect {};
let mut tokenizer = Tokenizer::new(&dialect, &sql);
let tokens = tokenizer.tokenize().unwrap();

let expected = vec![
Token::make_keyword("SELECT"),
Token::Whitespace(Whitespace::Space),
Token::Number(String::from("1e10"), false),
Token::Comma,
Token::Whitespace(Whitespace::Space),
Token::Number(String::from("1e-10"), false),
Token::Comma,
Token::Whitespace(Whitespace::Space),
Token::Number(String::from("1e+10"), false),
Token::Comma,
Token::Whitespace(Whitespace::Space),
Token::Number(String::from("1"), false),
Token::make_word("ea", None),
Token::Comma,
Token::Whitespace(Whitespace::Space),
Token::Number(String::from("1e-10"), false),
Token::make_word("a", None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this very strange that a new token is formed without whitespace after a number. I expected that this is a token error but this implementation agrees with postgres 🤯

postgres=# select 12e-10a;
      a       
--------------
 0.0000000012
(1 row)

postgres=# select 12e-10 a;
      a       
--------------
 0.0000000012
(1 row)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

postgres=# select 1e-10-10;
   ?column?    
---------------
 -9.9999999999
(1 row)

postgres=# select 1e-10 -10;
   ?column?    
---------------
 -9.9999999999
(1 row)

🤯

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this behaviour is part of what bit me when trying to implement for Hive dialect 😅

Token::Comma,
Token::Whitespace(Whitespace::Space),
Token::Number(String::from("1e-10"), false),
Token::Minus,
Token::Number(String::from("10"), false),
];

compare(expected, tokens);
}

#[test]
fn tokenize_scalar_function() {
let sql = String::from("SELECT sqrt(1)");
Expand Down
51 changes: 51 additions & 0 deletions tests/sqlparser_common.rs
Expand Up @@ -775,6 +775,57 @@ fn parse_null_in_select() {
);
}

#[test]
fn parse_exponent_in_select() -> Result<(), ParserError> {
// all except Hive, as it allows numbers to start an identifier
let dialects = TestedDialects {
dialects: vec![
Box::new(AnsiDialect {}),
Box::new(BigQueryDialect {}),
Box::new(ClickHouseDialect {}),
Box::new(GenericDialect {}),
// Box::new(HiveDialect {}),
Box::new(MsSqlDialect {}),
Box::new(MySqlDialect {}),
Box::new(PostgreSqlDialect {}),
Box::new(RedshiftSqlDialect {}),
Box::new(SnowflakeDialect {}),
Box::new(SQLiteDialect {}),
],
};
let sql = "SELECT 10e-20, 1e3, 1e+3, 1e3a, 1e, 0.5e2";
let mut select = dialects.parse_sql_statements(sql)?;

let select = match select.pop().unwrap() {
Statement::Query(inner) => *inner,
_ => panic!("Expected Query"),
};
let select = match *select.body {
SetExpr::Select(inner) => *inner,
_ => panic!("Expected SetExpr::Select"),
};

assert_eq!(
&vec![
SelectItem::UnnamedExpr(Expr::Value(number("10e-20"))),
SelectItem::UnnamedExpr(Expr::Value(number("1e3"))),
SelectItem::UnnamedExpr(Expr::Value(number("1e+3"))),
SelectItem::ExprWithAlias {
expr: Expr::Value(number("1e3")),
alias: Ident::new("a")
},
SelectItem::ExprWithAlias {
expr: Expr::Value(number("1")),
alias: Ident::new("e")
},
SelectItem::UnnamedExpr(Expr::Value(number("0.5e2"))),
],
&select.projection
);

Ok(())
}

#[test]
fn parse_select_with_date_column_name() {
let sql = "SELECT date";
Expand Down