Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support SET variable #4069

Merged
merged 8 commits into from Nov 3, 2022
Merged

support SET variable #4069

merged 8 commits into from Nov 3, 2022

Conversation

waitingkuo
Copy link
Contributor

Which issue does this PR close?

Closes #4067

Rationale for this change

to support

SET [VARIABLE] [TO | =] [VALUE | 'VALUE']

What changes are included in this PR?

support u64 value

set datafusion.execution.batch_size = 1;
0 rows in set. Query took 0.001 seconds.
❯ show datafusion.execution.batch_size;
+---------------------------------+---------+
| name                            | setting |
+---------------------------------+---------+
| datafusion.execution.batch_size | 1       |
+---------------------------------+---------+
1 row in set. Query took 0.013 seconds.

support bool

set datafusion.execution.coalesce_batches to false;
0 rows in set. Query took 0.000 seconds.
❯ show datafusion.execution.coalesce_batches;
+---------------------------------------+---------+
| name                                  | setting |
+---------------------------------------+---------+
| datafusion.execution.coalesce_batches | false   |
+---------------------------------------+---------+
1 row in set. Query took 0.007 seconds.

support single quoted string

set datafusion.execution.coalesce_batches to 'false';
0 rows in set. Query took 0.000 seconds.
❯ show datafusion.execution.coalesce_batches;
+---------------------------------------+---------+
| name                                  | setting |
+---------------------------------------+---------+
| datafusion.execution.coalesce_batches | false   |
+---------------------------------------+---------+
1 row in set. Query took 0.007 seconds.

support alias TIME ZONE and TIMEZONE

as discussed in #3148 (comment)
we disallow set timezone for now until timezone integration is full completed

set time zone = 1;
Plan("Changing Time Zone isn't supported yet")

throw error for unknown variable

set abc = 1;
Execution("Unknown Variable abc")

throw error for incorrect data type

set datafusion.execution.batch_size = a;
Execution("Failed to parse a as u64")
set datafusion.execution.batch_size = -1;
Execution("Failed to parse -1 as u64")

Are there any user-facing changes?

@github-actions github-actions bot added core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql labels Nov 1, 2022
Comment on lines +345 to +349
LogicalPlan::SetVariable(SetVariable {
variable, value, ..
}) => {
let config_options = &self.state.write().config.config_options;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better refactor this part in the follow on pr once #3909 merged

Comment on lines 350 to 356
let old_value =
config_options.read().get(&variable).ok_or_else(|| {
DataFusionError::Execution(format!(
"Unknown Variable {}",
variable
))
})?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return error if variable isn't in config_options

Comment on lines +358 to +394
match old_value {
ScalarValue::Boolean(_) => {
let new_value = value.parse::<bool>().map_err(|_| {
DataFusionError::Execution(format!(
"Failed to parse {} as bool",
value,
))
})?;
config_options.write().set_bool(&variable, new_value);
}

ScalarValue::UInt64(_) => {
let new_value = value.parse::<u64>().map_err(|_| {
DataFusionError::Execution(format!(
"Failed to parse {} as u64",
value,
))
})?;
config_options.write().set_u64(&variable, new_value);
}

ScalarValue::Utf8(_) => {
let new_value = value.parse::<String>().map_err(|_| {
DataFusionError::Execution(format!(
"Failed to parse {} as String",
value,
))
})?;
config_options.write().set_string(&variable, new_value);
}

_ => {
return Err(DataFusionError::Execution(
"Unsupported Scalar Value Type".to_string(),
))
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether it's a good way to check the data type. ideally we should use DataType defined in ConfigDefinition https://github.com/apache/arrow-datafusion/blob/97f2e4fd5517c762b0862d22b81f957db511e22e/datafusion/core/src/config.rs#L72-L80

but this information is gone once it's been converted to ConfigOptions
https://github.com/apache/arrow-datafusion/blob/97f2e4fd5517c762b0862d22b81f957db511e22e/datafusion/core/src/config.rs#L279-L281

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking the existing type is a reasonable approach

Comment on lines +258 to +272
#[tokio::test]
async fn set_time_zone() {
// we don't support changing time zone for now until all time zone issues fixed and related function completed

let ctx = SessionContext::new();

// for full variable name
let err = plan_and_collect(&ctx, "set datafusion.execution.time_zone = '8'")
.await
.unwrap_err();

assert_eq!(
err.to_string(),
"Error during planning: Changing Time Zone isn't supported yet"
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should get back to this once time zone is fully integrated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +2468 to +2472
if local {
return Err(DataFusionError::NotImplemented(
"LOCAL is not supported".to_string(),
));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support SET LOCAL VARIABLE TO VALUE

Comment on lines +2474 to +2478
if hivevar {
return Err(DataFusionError::NotImplemented(
"HIVEVAR is not supported".to_string(),
));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support HIVEVAR. I'm not sure what it is

Comment on lines +2483 to +2486
if variable_lower == "timezone" || variable_lower == "time.zone" {
// we could introduce alias in OptionDefinition if this string matching thing grows
variable_lower = "datafusion.execution.time_zone".to_string();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias timezone and time zone
note that time zone is converted to time.zone during sql parsing


// parse value string from Expr
let value_string = match &value[0] {
SQLExpr::Identifier(i) => i.to_string(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support non-whitespace-separated string

Comment on lines +2499 to +2500
Value::SingleQuotedString(s) => s.to_string(),
Value::Number(_, _) | Value::Boolean(_) => v.to_string(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support SingleQuotedString, number and Bool

Comment on lines +2515 to +2516
UnaryOperator::Plus => format!("+{}", expr),
UnaryOperator::Minus => format!("-{}", expr),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support signed number e.g. +8, -8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative (for the future) could be to parse the value as an Expr and then call ExprSimplifier::simplify(expr) on it during evaluation in SessionContext.

That would then support sql statements like

set batch_size = 8*1024

I think this way is fine, too

Example of how to do it
https://github.com/apache/arrow-datafusion/blob/10e64dc013ba210ab1f6c2a3c02c66aef4a0e802/datafusion-examples/examples/expr_api.rs#L77-L89

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not familiar with this yet, will try it and send a ticket or pr

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful @waitingkuo -- thank you 🏆

datafusion/core/src/config.rs Outdated Show resolved Hide resolved
datafusion/core/src/execution/context.rs Outdated Show resolved Hide resolved
Comment on lines +358 to +394
match old_value {
ScalarValue::Boolean(_) => {
let new_value = value.parse::<bool>().map_err(|_| {
DataFusionError::Execution(format!(
"Failed to parse {} as bool",
value,
))
})?;
config_options.write().set_bool(&variable, new_value);
}

ScalarValue::UInt64(_) => {
let new_value = value.parse::<u64>().map_err(|_| {
DataFusionError::Execution(format!(
"Failed to parse {} as u64",
value,
))
})?;
config_options.write().set_u64(&variable, new_value);
}

ScalarValue::Utf8(_) => {
let new_value = value.parse::<String>().map_err(|_| {
DataFusionError::Execution(format!(
"Failed to parse {} as String",
value,
))
})?;
config_options.write().set_string(&variable, new_value);
}

_ => {
return Err(DataFusionError::Execution(
"Unsupported Scalar Value Type".to_string(),
))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking the existing type is a reasonable approach

Comment on lines +258 to +272
#[tokio::test]
async fn set_time_zone() {
// we don't support changing time zone for now until all time zone issues fixed and related function completed

let ctx = SessionContext::new();

// for full variable name
let err = plan_and_collect(&ctx, "set datafusion.execution.time_zone = '8'")
.await
.unwrap_err();

assert_eq!(
err.to_string(),
"Error during planning: Changing Time Zone isn't supported yet"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

datafusion/expr/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
Comment on lines +2515 to +2516
UnaryOperator::Plus => format!("+{}", expr),
UnaryOperator::Minus => format!("-{}", expr),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative (for the future) could be to parse the value as an Expr and then call ExprSimplifier::simplify(expr) on it during evaluation in SessionContext.

That would then support sql statements like

set batch_size = 8*1024

I think this way is fine, too

Example of how to do it
https://github.com/apache/arrow-datafusion/blob/10e64dc013ba210ab1f6c2a3c02c66aef4a0e802/datafusion-examples/examples/expr_api.rs#L77-L89

waitingkuo and others added 5 commits November 3, 2022 02:56
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@waitingkuo
Copy link
Contributor Author

@alamb thank you for the review, learned a lot from it.
updated the test cases according to the suggested changes

@alamb alamb merged commit c9442ce into apache:master Nov 3, 2022
@ursabot
Copy link

ursabot commented Nov 3, 2022

Benchmark runs are scheduled for baseline = 761e167 and contender = c9442ce. c9442ce is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
* support SET

* remove useless comment

* add test cases

* Update datafusion/core/src/config.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/core/src/execution/context.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix test cases

* fmt

* Update datafusion/expr/src/logical_plan/plan.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support SET command
3 participants