New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlightSQL Client & integration test #3207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments from me below.
IMO all .map_err(|_| ...)
should be improved to keep the original error, so that it is easier for the developer/user to debug issues.
Good work, @avantgardnerio !
|
||
impl FlightSqlServiceClient { | ||
pub async fn new_with_ep(host: &str, port: u16) -> Result<Self, ArrowError> { | ||
let addr = format!("http://{}:{}", host, port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work with HTTPS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed #3309 to track
let addr = format!("http://{}:{}", host, port); | ||
let endpoint = Endpoint::new(addr) | ||
.map_err(|_| ArrowError::IoError("Cannot create endpoint".to_string()))? | ||
.connect_timeout(Duration::from_secs(20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these settings should probably be passed as parameters, e.g. ClientOptions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why they are there, TBH. This code appears to have been copy & pasted from Ballista. I'd suggest we remove them and accept the defaults here and let users call new(channel)
if they need extra configuration customization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be much nicer to avoid hard coded defaults.
One classic pattern would be a client builder
let client = FlightClient::builder()
.with_timeout(Duration....)
.build()
.await?
However, I think we could do this as a follow on PR as well
Thank you @avantgardnerio -- FYI I believe that @tustvold has this one on his list to review carefully in the upcoming days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @avantgardnerio -- sorry for the delay in review.
I think the flightSQL parts of this PR could be merged with some small documentation touchups
I am worried about unintended fallout from the miscellaneous other changes in this PR.
Thus, I suggest:
- Mark the API as "experimental" (perhaps just put a note in the docstrings)
- Break this PR into the "flight" part and the "misc cleanup" part
- File a follow on ticket for client configuration API
I am happy to do any/all of the above to get this PR moving. Given our release schedule calls for an RC to be cut later this week, I hope to help get this PR in by then.
let uds = UnixListener::bind(path.clone()).unwrap(); | ||
let stream = UnixListenerStream::new(uds); | ||
|
||
// We would just listen on TCP, but it seems impossible to know when tonic is ready to serve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We got around this limitation in IOx by polling (as in try a request in a loop, if it fails, sleep for a while and try again, with an eventual timeout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish they would add a method where you can await actual start, or a oneshot channel, or something, but I guess this is where we are.
I also like this approach because it allows the tests to run in parallel. I'm open to either way, I don't have a strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
me neither -- this is fine with me
let addr = format!("http://{}:{}", host, port); | ||
let endpoint = Endpoint::new(addr) | ||
.map_err(|_| ArrowError::IoError("Cannot create endpoint".to_string()))? | ||
.connect_timeout(Duration::from_secs(20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be much nicer to avoid hard coded defaults.
One classic pattern would be a client builder
let client = FlightClient::builder()
.with_timeout(Duration....)
.build()
.await?
However, I think we could do this as a follow on PR as well
db9fc38
to
ece7553
Compare
@alamb I think I addressed the 3 main points above. Hopefully this makes it "merge ready" and the remaining issues can be handled in follow-on PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @avantgardnerio
I think this PR represents a major step forward for FlightSQL and could me merged as is, even though there are several significant items left. The only thing I think we should remove prior to releasing the next arrow-rs is the new tokio-stream dependency (which I am happy to make a PR)
Since all this code is behind the --flight-sql-experimental
flag I don't think this will impact downstream consumers.
Thus, I would like to merge it in so we can work on the other items in parallel and will start an organziational ticket to track the work
} | ||
|
||
#[tokio::test] | ||
async fn test_select_1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does not appear to actually run as part of cargo test --all
The only way I could get it to run is like:
cargo test -p arrow-flight --features=flight-sql-experimental --examples
Which does not appear to run as part of CI
I plan to merge this PR tomorrow (prior to cutting the next arrow-rs release) unless I hear otherwise I am starting to gather / plan FlightSQL support in #3301 |
.timeout(Duration::from_secs(20)) | ||
.tcp_nodelay(true) // Disable Nagle's Algorithm since we don't want packets to wait | ||
.tcp_keepalive(Option::Some(Duration::from_secs(3600))) | ||
.http2_keep_alive_interval(Duration::from_secs(300)) | ||
.keep_alive_timeout(Duration::from_secs(20)) | ||
.keep_alive_while_idle(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may come with a option struct and have these as default values. And users can specify them if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #3308 for this item
@@ -45,6 +45,10 @@ default = [] | |||
flight-sql-experimental = ["prost-types"] | |||
|
|||
[dev-dependencies] | |||
arrow = { version = "28.0.0", path = "../arrow", features = ["prettyprint"] } | |||
tempfile = "3.3" | |||
tokio-stream = { version = "0.1", features = ["net"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
I believe I have filed tickets for all follow ons identified in this PR. They are collected under #3301 |
Thanks @avantgardnerio for the contribution and @martin-g and @viirya for the review! |
My pleasure... I'm glad we can finally start writing automated tests for FlightSQL in Ballista 👍 |
🎉 As a heads up @avantgardnerio -- what I plan to do is to hack a FlightSQL implementation into IOx, likely with a temporary fork of arrow-flight and then contribute anything needed back upstream (like client configuration). I'll keep you updated |
let schema = Arc::new(schema); | ||
|
||
let mut batches = vec![]; | ||
let dictionaries_by_id = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't support dictionaries yet? Maybe we can turn this into an issue?
let mut flight_data = vec![]; | ||
for batch in batches.iter() { | ||
let (flight_dictionaries, flight_datum) = | ||
flight_data_from_arrow_batch(batch, &options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3312 - for dictionary data this is inefficient (as repeated use of the dictionary will have new FlightData
instances to send over the wire.
) | ||
})?; | ||
|
||
let dictionaries_by_field = HashMap::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - it should support dictionaries on the wire?
@avantgardnerio really cool stuff! I added some notes on the dictionary support as this might be a big bottleneck for dictionary-encoded data. |
* squash * Undo nightly clippy advice * PR feedback * PR feedback * PR feedback * PR feedback * Formatting
Which issue does this PR close?
Closes #3206.
Closes #1413
Rationale for this change
Described in issue.
What changes are included in this PR?
A Flight SQL client & client server integration test.
Are there any user-facing changes?
The client & test.
Additional notes
This is largely based off of work done by @wangfenjin