Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WEBSITE] Blog post about DataFusion 13.0.0 #254

Merged
merged 17 commits into from
Oct 25, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 11, 2022

re apache/datafusion#3671

This blog describes what has been going on in DataFusion for the last 5 months

Edit: URL location https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0/

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@alamb alamb marked this pull request as ready for review October 20, 2022 19:23
@alamb
Copy link
Contributor Author

alamb commented Oct 20, 2022

@andygrove mentions there is a draft blog for datafusion 11 that was not published that we can use for additional content: https://docs.google.com/document/d/1tPCgeB6iQPVvbRyaXft7nKqorrhv-XDqVXuMp4pG9ns/edit?usp=sharing


While Velox and Acero focus on execution engines, DataFusion provides the entire suite of components needed to build most analytic systems, including a SQL frontend, a dataframe API, and extension points for just about everything. Some [DataFusion users](https://github.com/apache/arrow-datafusion#known-uses) use a subset of the features such as the frontend (e.g. (dask-sql)[https://dask-sql.readthedocs.io/en/latest/] or the execution engine, such as [Blaze](https://github.com/blaze-init/blaze), and some users use many different components to build both SQL based and customized DSL based systems such as [InfluxDB IOx](https://github.com/influxdata/influxdb_iox/pulls) and [VegaFusion](https://github.com/vegafusion/vegafusion).

One of DataFusion’s advantages is its implementation in [Rust](https://www.rust-lang.org/) and thus its easy integration with the broader Rust ecosystem. Rust continues to be a major source of benefit, from the [ease of parallelization with the high quality and standardized `async` ecosystem](https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/) , as well as its modern dependency management system and wonderful performance. <!-- I wonder if we should link to clickbench?? -->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should link to clickbench??

Did the clickbench results got updated with 13.0? AFAIK we should be much faster than we were compared to the initial integration time (there were a lot of slowness coming from SelectK queries and a few other optimizations like regex_replace, which we should handle much better now). CC: @waitingkuo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was the result from 12.0. i'll update it soon with the latest version

_posts/2022-10-20-datafusion-13.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-20-datafusion-13.0.0.md Outdated Show resolved Hide resolved
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@alamb
Copy link
Contributor Author

alamb commented Oct 24, 2022

I plan to update the dates on this PR and publish it tomorrow unless anyone needs more time to review. Please just let me know if you do so

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants