Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): improve sort. #8452

Merged
merged 9 commits into from Nov 4, 2022
Merged

Conversation

RinChanNOWWW
Copy link
Member

@RinChanNOWWW RinChanNOWWW commented Oct 25, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

  1. make the sort pipeline more streaming-friendly.
  2. introduce row format in memory.

Perfomance test

Testing environment

query engine:

  • OS: Ubuntu 18.04 LTS (Bionic Beaver)
  • CPU: Intel Xeon Gold 5218R @ 80x 2.123GHz
  • Memory: 90G
  • Disk: 1.1T / 1,9T

minio storge:

  • OS: Ubuntu 22.04 jammy
  • CPU: Intel Xeon E-2124 @ 4x 4.3GHz
  • Memory: 8G
  • Disk: 147G / 1.9T

query engine machine and minio storage machine are connected via LAN.

Testing method:

Data is stored on another machine and fetched via minio. Every SQL is run 100 times.

Conclusion

New sort is 10%~50% faster than the old one.

Result

<style> td {white-space:pre-wrap;border:1px solid #dee0e3;}</style>
ID new old old/new SQL
1 1.12060627 1.58862891 1.417651277 select * from numbers(10000000) order by number;
2 1.1151245 1.64183498 1.47233334 select * from numbers(10000000) order by number desc;
3 0.27395058 0.26789701 0.977902693 select userid, flashmajor from hits order by flashmajor, userid desc;
4 0.13259244 0.1700541 1.282532398 select resolutiondepth from hits order by resolutiondepth;
5 0.33951065 0.35350362 1.041215114 select title from hits order by title;
6 0.33765341 0.34813435 1.031040528 select title from hits order by title desc;
7 0.38987365 0.46025009 1.180510891 select userid, title from hits order by userid, title;
8 0.37326987 0.47894412 1.283104152 select userid, title from hits order by userid desc, title;
9 0.42465461 0.46893471 1.104273212 select userid, title from hits order by userid, title desc;
10 0.36837134 0.39771965 1.079670449 select userid, title from hits order by userid desc, title desc;

top 100

<style> td {white-space:pre-wrap;border:1px solid #dee0e3;}</style>
ID new old old/new SQL
1 0.02470555 0.02539468 1.027893732 select * from numbers(10000000) order by number limit 100;
2 0.02436718 0.02419826 0.993067725 select * from numbers(10000000) order by number desc limit 100;
3 0.3007082 0.27476401 0.913723038 select userid, flashmajor from hits order by flashmajor, userid desc limit 100;
4 0.16701002 0.17488738 1.04716699 select resolutiondepth from hits order by resolutiondepth limit 100;
5 0.14109541 0.1646738 1.167109547 select title from hits order by title limit 100;
6 0.13761878 0.17209053 1.250487252 select title from hits order by title desc limit 100;
7 0.24831972 0.24537162 0.988127806 select userid, title from hits order by userid, title limit 100;
8 0.31880831 0.28931403 0.907485849 select userid, title from hits order by userid desc, title limit 100;
9 0.20367163 0.2316427 1.137334149 select userid, title from hits order by userid, title desc limit 100;
10 0.26425413 0.25043391 0.947701026 select userid, title from hits order by userid desc, title desc limit 100;

aggregation after sort

<style> td {white-space:pre-wrap;border:1px solid #dee0e3;}</style>
ID new old old/new SQL
1 0.11795881 0.16286981 1.380734597 select avg(fetchtiming) from (select * from hits order by fetchtiming desc limit 100);
2 0.1157759 0.17959318 1.551213854 select avg(fetchtiming) from (select * from hits order by fetchtiming desc limit 1000);
3 0.11912486 0.17966659 1.508220786 select avg(fetchtiming) from (select * from hits order by fetchtiming desc limit 10000);
4 0.13869412 0.17961846 1.295069034 select avg(fetchtiming) from (select * from hits order by fetchtiming desc limit 50000);
5 0.12989429 0.18346159 1.412391492 select avg(fetchtiming) from (select * from hits order by fetchtiming desc limit 90000);
6 0.1238525 0.16922857 1.366371854 select avg(sendtiming) from (select * from hits order by sendtiming desc limit 50000);
7 0.12223493 0.18473938 1.511346879 select avg(dnstiming) from (select * from hits order by dnstiming desc limit 50000);
8 0.20195231 0.19324225 0.956870709 select avg(connecttiming) from (select * from hits order by connecttiming desc limit 50000);
9 0.15391106 0.17401594 1.13062661 select avg(responsestarttiming) from (select * from hits order by responsestarttiming desc limit 50000);
10 0.13038866 0.16554495 1.269626899 select avg(responseendtiming) from (select * from hits order by responseendtiming desc limit 50000);

@vercel
Copy link

vercel bot commented Oct 25, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
databend ✅ Ready (Inspect) Visit Preview Nov 4, 2022 at 4:27AM (UTC)

@RinChanNOWWW RinChanNOWWW force-pushed the improve_sort branch 6 times, most recently from 75359d2 to 51c6221 Compare November 1, 2022 05:51
@RinChanNOWWW RinChanNOWWW changed the title WIP: improve sort. feat(query): improve sort. Nov 1, 2022
@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 1, 2022
@RinChanNOWWW RinChanNOWWW marked this pull request as ready for review November 2, 2022 13:04
@sundy-li
Copy link
Member

sundy-li commented Nov 3, 2022

LGTM, but we need to improve to make some queries not slower than before in the next round.

@BohuTANG
Copy link
Member

BohuTANG commented Nov 4, 2022

Conflicting files :)

@mergify mergify bot merged commit 93bb4ba into datafuselabs:main Nov 4, 2022
@RinChanNOWWW RinChanNOWWW deleted the improve_sort branch November 4, 2022 06:00
@RinChanNOWWW
Copy link
Member Author

The row format was merge into the main branch of arrow2. We can change the rev in Cargo.toml in later pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants