Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlightSQL Client & integration test #3207

Merged
merged 7 commits into from Dec 9, 2022

Conversation

avantgardnerio
Copy link
Contributor

@avantgardnerio avantgardnerio commented Nov 27, 2022

Which issue does this PR close?

Closes #3206.
Closes #1413

Rationale for this change

Described in issue.

What changes are included in this PR?

A Flight SQL client & client server integration test.

Are there any user-facing changes?

The client & test.

Additional notes

This is largely based off of work done by @wangfenjin

@avantgardnerio
Copy link
Contributor Author

CC @stuartcarnie @alamb

@github-actions github-actions bot added the arrow-flight Changes to the arrow-flight crate label Nov 27, 2022
@avantgardnerio avantgardnerio changed the title (Failing) client/server integration test FlightSQL Client & integration test Nov 27, 2022
@avantgardnerio avantgardnerio marked this pull request as ready for review November 27, 2022 18:21
@github-actions github-actions bot added arrow Changes to the arrow crate object-store Object Store Interface parquet Changes to the parquet crate labels Nov 27, 2022
Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments from me below.
IMO all .map_err(|_| ...) should be improved to keep the original error, so that it is easier for the developer/user to debug issues.

Good work, @avantgardnerio !

arrow-flight/examples/flight_sql_server.rs Outdated Show resolved Hide resolved
arrow-flight/examples/flight_sql_server.rs Outdated Show resolved Hide resolved
arrow-flight/examples/flight_sql_server.rs Outdated Show resolved Hide resolved
arrow-flight/examples/flight_sql_server.rs Outdated Show resolved Hide resolved
arrow-flight/examples/flight_sql_server.rs Outdated Show resolved Hide resolved
arrow-flight/examples/flight_sql_server.rs Outdated Show resolved Hide resolved
arrow-flight/src/utils.rs Outdated Show resolved Hide resolved

impl FlightSqlServiceClient {
pub async fn new_with_ep(host: &str, port: u16) -> Result<Self, ArrowError> {
let addr = format!("http://{}:{}", host, port);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with HTTPS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #3309 to track

let addr = format!("http://{}:{}", host, port);
let endpoint = Endpoint::new(addr)
.map_err(|_| ArrowError::IoError("Cannot create endpoint".to_string()))?
.connect_timeout(Duration::from_secs(20))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these settings should probably be passed as parameters, e.g. ClientOptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why they are there, TBH. This code appears to have been copy & pasted from Ballista. I'd suggest we remove them and accept the defaults here and let users call new(channel) if they need extra configuration customization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be much nicer to avoid hard coded defaults.

One classic pattern would be a client builder

let client = FlightClient::builder()
 .with_timeout(Duration....)
 .build()
 .await?

However, I think we could do this as a follow on PR as well

arrow-flight/src/sql/client.rs Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Nov 28, 2022

Thank you @avantgardnerio -- FYI I believe that @tustvold has this one on his list to review carefully in the upcoming days

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @avantgardnerio -- sorry for the delay in review.

I think the flightSQL parts of this PR could be merged with some small documentation touchups

I am worried about unintended fallout from the miscellaneous other changes in this PR.

Thus, I suggest:

  • Mark the API as "experimental" (perhaps just put a note in the docstrings)
  • Break this PR into the "flight" part and the "misc cleanup" part
  • File a follow on ticket for client configuration API

I am happy to do any/all of the above to get this PR moving. Given our release schedule calls for an RC to be cut later this week, I hope to help get this PR in by then.

arrow-csv/src/reader.rs Outdated Show resolved Hide resolved
let uds = UnixListener::bind(path.clone()).unwrap();
let stream = UnixListenerStream::new(uds);

// We would just listen on TCP, but it seems impossible to know when tonic is ready to serve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got around this limitation in IOx by polling (as in try a request in a loop, if it fails, sleep for a while and try again, with an eventual timeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish they would add a method where you can await actual start, or a oneshot channel, or something, but I guess this is where we are.

I also like this approach because it allows the tests to run in parallel. I'm open to either way, I don't have a strong opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me neither -- this is fine with me

arrow-flight/examples/flight_sql_server.rs Show resolved Hide resolved
arrow-select/src/take.rs Outdated Show resolved Hide resolved
let addr = format!("http://{}:{}", host, port);
let endpoint = Endpoint::new(addr)
.map_err(|_| ArrowError::IoError("Cannot create endpoint".to_string()))?
.connect_timeout(Duration::from_secs(20))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be much nicer to avoid hard coded defaults.

One classic pattern would be a client builder

let client = FlightClient::builder()
 .with_timeout(Duration....)
 .build()
 .await?

However, I think we could do this as a follow on PR as well

arrow-flight/src/sql/client.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/client.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/client.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/client.rs Outdated Show resolved Hide resolved
arrow-flight/src/sql/client.rs Show resolved Hide resolved
@github-actions github-actions bot removed parquet Changes to the parquet crate arrow Changes to the arrow crate object-store Object Store Interface labels Dec 7, 2022
@avantgardnerio
Copy link
Contributor Author

@alamb I think I addressed the 3 main points above. Hopefully this makes it "merge ready" and the remaining issues can be handled in follow-on PRs.

@avantgardnerio avantgardnerio requested review from alamb and removed request for martin-g December 7, 2022 21:50
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @avantgardnerio

I think this PR represents a major step forward for FlightSQL and could me merged as is, even though there are several significant items left. The only thing I think we should remove prior to releasing the next arrow-rs is the new tokio-stream dependency (which I am happy to make a PR)

Since all this code is behind the --flight-sql-experimental flag I don't think this will impact downstream consumers.

Thus, I would like to merge it in so we can work on the other items in parallel and will start an organziational ticket to track the work

@tustvold / @viirya do you object to merging this PR?

arrow-flight/Cargo.toml Outdated Show resolved Hide resolved
}

#[tokio::test]
async fn test_select_1() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not appear to actually run as part of cargo test --all

The only way I could get it to run is like:

cargo test -p arrow-flight  --features=flight-sql-experimental --examples

Which does not appear to run as part of CI

@alamb
Copy link
Contributor

alamb commented Dec 8, 2022

I plan to merge this PR tomorrow (prior to cutting the next arrow-rs release) unless I hear otherwise

I am starting to gather / plan FlightSQL support in #3301

arrow-flight/src/utils.rs Outdated Show resolved Hide resolved
Comment on lines +66 to +71
.timeout(Duration::from_secs(20))
.tcp_nodelay(true) // Disable Nagle's Algorithm since we don't want packets to wait
.tcp_keepalive(Option::Some(Duration::from_secs(3600)))
.http2_keep_alive_interval(Duration::from_secs(300))
.keep_alive_timeout(Duration::from_secs(20))
.keep_alive_while_idle(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may come with a option struct and have these as default values. And users can specify them if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer that approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #3308 for this item

@@ -45,6 +45,10 @@ default = []
flight-sql-experimental = ["prost-types"]

[dev-dependencies]
arrow = { version = "28.0.0", path = "../arrow", features = ["prettyprint"] }
tempfile = "3.3"
tokio-stream = { version = "0.1", features = ["net"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Dec 9, 2022

I believe I have filed tickets for all follow ons identified in this PR. They are collected under #3301

@alamb
Copy link
Contributor

alamb commented Dec 9, 2022

Thanks @avantgardnerio for the contribution and @martin-g and @viirya for the review!

@alamb alamb merged commit d18827d into apache:master Dec 9, 2022
@avantgardnerio avantgardnerio deleted the bg_flightsql_client branch December 9, 2022 15:22
@avantgardnerio
Copy link
Contributor Author

Thanks @avantgardnerio

My pleasure... I'm glad we can finally start writing automated tests for FlightSQL in Ballista 👍

@alamb
Copy link
Contributor

alamb commented Dec 9, 2022

My pleasure... I'm glad we can finally start writing automated tests for FlightSQL in Ballista 👍

🎉

As a heads up @avantgardnerio -- what I plan to do is to hack a FlightSQL implementation into IOx, likely with a temporary fork of arrow-flight and then contribute anything needed back upstream (like client configuration). I'll keep you updated

let schema = Arc::new(schema);

let mut batches = vec![];
let dictionaries_by_id = HashMap::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't support dictionaries yet? Maybe we can turn this into an issue?

let mut flight_data = vec![];
for batch in batches.iter() {
let (flight_dictionaries, flight_datum) =
flight_data_from_arrow_batch(batch, &options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3312 - for dictionary data this is inefficient (as repeated use of the dictionary will have new FlightData instances to send over the wire.

)
})?;

let dictionaries_by_field = HashMap::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - it should support dictionaries on the wire?

@Dandandan
Copy link
Contributor

@avantgardnerio really cool stuff!

I added some notes on the dictionary support as this might be a big bottleneck for dictionary-encoded data.

alamb pushed a commit to alamb/arrow-rs that referenced this pull request Dec 9, 2022
* squash

* Undo nightly clippy advice

* PR feedback

* PR feedback

* PR feedback

* PR feedback

* Formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement FlightSQL Client Implement Basic FlightSQL Client
5 participants