Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add datafusion #14

Merged
merged 17 commits into from
Aug 17, 2022
Merged

add datafusion #14

merged 17 commits into from
Aug 17, 2022

Conversation

waitingkuo
Copy link
Contributor

Add datafusion
not yet find a "c6a.4xlarge, 500gb gp2" vm to test this

@CLAassistant
Copy link

CLAassistant commented Aug 2, 2022

CLA assistant check
All committers have signed the CLA.

@alexey-milovidov
Copy link
Member

Thank you! I'm very interested in this.

@alexey-milovidov alexey-milovidov self-assigned this Aug 6, 2022
datafusion/benchmark.sh Outdated Show resolved Hide resolved
@qoega
Copy link
Member

qoega commented Aug 9, 2022

I tried to run it on c6a.4xlarge

[2.638, 0.331, 0.305],
[0.516, 0.362, 0.362],
[1.312, 0.938, 0.888],
[1.427, 0.483, 0.487],
[3.006, 2.958, 2.953],
[null, null, null],
[0.479, 0.358, 0.353],
[0.450, 0.362, 0.367],
[3.952, 3.568, 3.370],
[6.014, 5.381, 5.570],
[2.301, 1.797, 1.726],
[2.810, 2.234, 2.139],
[null, null, null],
[null, null, null],
[null, null, null],
[4.722, 4.707, 4.469],
[null, null, null],
[null, null, null],
[null, null, null],
[0.786, 0.510, 0.494],
[14.218, 8.460, 8.409],
[17.441, 11.445, 11.307],
[41.716, 28.732, 28.761],
[129.533, 95.898, 95.534],
[7.422, 5.185, 5.246],
[10.154, 9.899, 10.094],
[13.262, 11.534, 11.464],
[18.279, 13.104, 14.166],
[225.516, 232.958, 234.723],
[2.859, 3.152, 3.202],
[8.917, 6.679, 6.655],
[11.631, 7.802, 7.855],
[0.081, 0.054, 0.056],
[null, null, null],
[null, null, null],
[4.864, 4.564, 4.709],
[null, null, null],
[null, null, null],
[null, null, null],
[null, null, null],
[null, null, null],
[null, null, null],
[null, null, null],

@alexey-milovidov
Copy link
Member

@waitingkuo Let's submit these results and then improve?

@waitingkuo
Copy link
Contributor Author

@alexey-milovidov we haven't fixed some issues in our master branch, i'll modify the installation part to build the latest and then submit it

@waitingkuo
Copy link
Contributor Author

@qoega thank you!

@waitingkuo waitingkuo marked this pull request as ready for review August 10, 2022 15:28
@waitingkuo
Copy link
Contributor Author

@qoega i fixed some issues and all the test cases passed 😃
i've added the result (it was running in azure, i tried my best to find a similar VM there)

@alexey-milovidov it's ready to merge, thank you~

@waitingkuo
Copy link
Contributor Author

@waitingkuo waitingkuo changed the title [WIP] add datafusion add datafusion Aug 10, 2022
@qoega
Copy link
Member

qoega commented Aug 10, 2022

Current version

[2.629, 0.334, 0.308],
[0.501, 0.361, 0.369],
[1.268, 0.874, 0.862],
[1.380, 0.484, 0.479],
[2.993, 3.000, 3.005],
[5.118, 4.026, 4.026],
[0.502, 0.358, 0.347],
[0.444, 0.359, 0.367],
[4.121, 3.484, 3.534],
[5.929, 5.335, 5.379],
[1.842, 1.357, 1.336],
[2.241, 1.614, 1.722],
[5.574, 4.465, 4.589],
[7.704, 6.852, 6.775],
[5.938, 5.027, 4.977],
[5.022, 4.762, 4.682],
[10.981, 9.900, 9.915],
[8.992, 7.361, 7.306],
[21.124, 17.996, 17.917],
[1.180, 0.443, 0.416],
[13.902, 8.194, 7.936],
[16.771, 10.375, 10.443],
[40.808, 27.454, 27.461],
[115.434, 82.709, 82.571],
[7.043, 4.855, 4.761],
[10.034, 10.077, 9.909],
[13.047, 11.289, 11.196],
[13.947, 7.928, 7.885],
[223.222, 229.755, 234.819],
[1.829, 2.235, 2.208],
[8.378, 6.282, 6.490],
[10.424, 7.384, 7.216],
[0.055, 0.055, 0.056],
[26.220, 20.760, 20.288],
[27.056, 20.990, 20.964],
[4.797, 4.557, 4.788],
[0.414, 0.270, 0.274],
[0.297, 0.225, 0.220],
[0.267, 0.241, 0.219],
[0.698, 0.533, 0.518],
[0.147, 0.087, 0.089],
[0.127, 0.085, 0.091],
[0.112, 0.076, 0.075],

@waitingkuo waitingkuo marked this pull request as draft August 10, 2022 16:53
@waitingkuo
Copy link
Contributor Author

@qoega thank you

unfortunately, i just found a issue here. i'll make it right again as soon as possible

@qoega
Copy link
Member

qoega commented Aug 10, 2022

No problem. I just have that instance ready for benchmarks and can pull and run anytime.

@waitingkuo
Copy link
Contributor Author

@qoega @alexey-milovidov i submitted an issues for parquet data #18

hits.parquet and hits_{n}.parquet are slight different

@waitingkuo
Copy link
Contributor Author

datafusion parquet importer doesn't support schema for now, it's inferred from parquet metadata directly.

@waitingkuo
Copy link
Contributor Author

i did the benchmark again. it works now. I thought that there were some issues it turned out that i used hits_0.parquet to do the quick test.
It's ready to be merged again :D

@waitingkuo waitingkuo marked this pull request as ready for review August 10, 2022 18:55
@waitingkuo
Copy link
Contributor Author

@qoega @alexey-milovidov i've made the result up to date. please let me if there's anything i need to improve. thanks~

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Aug 17, 2022

@waitingkuo Thank you!
I also wanted to re-run on AWS for better comparison and to check for reproducibility, but let's firstly merge as is.

@alexey-milovidov alexey-milovidov merged commit 1697007 into ClickHouse:main Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants