Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include support to limit rows in a group by operation #3121

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lppassos
Copy link

The limit is based on a system property "h2.maxGroupByEntries". The default value for the system property is -1, which keeps the current behaviour of the system.

If the value is changed to a positive number, it will limit the maximum number of different rows in the groupByData tree map in Grouped.

If someone issues a query that would create a table that is too big, only that query will fail. I've done this change to diminish the risk of OOM when running the embedded server. It doesn't solve it completely of course, because it doesn't actually consider the memory being used by the tree map, but given it's simplicity I think it could be a useful setting in some situations.

The limit is based on a system property.
@katzyn
Copy link
Contributor

katzyn commented May 18, 2021

Thank you for your contribution!

I'm not sure about this idea, however. The proper solution is to buffer large groups on the disk first, just like large regular results are buffered, but with groups situation is more complicated, they need a more complicated format.

I also think that number of groups is a very poor metric. In regular results limit of in-memory data before buffering is specified in rows and it is a subject to change, because it doesn't actually prevent OOME in various cases. With groups number of grouped rows has even smaller meaning, because some aggregates are very cheap and others are actually hold all values from the original rows and their count isn't limited anywhere in your PR.

I think we can accept a some new setting for the grouped and window results, but it would be nice to estimate their sizes better than there. The new setting should also be explicitly marked as experimental only in documentation, and I think the special case with group-sorted optimization should also be mentioned, I guess it isn't affected by your new setting. This setting should also be covered by a some test.

@lppassos
Copy link
Author

Thanks for your comments.

I completely agree with you on what you say. The better solution would be like you say to use disk for large result sets, and number of entries is a poor metric for memory consumption. The only saving grace of this approach is that because its simplistic, it was also quite simple to do. It being my first contribution, I didn't want to go for something too complicated.

If you feel there is merit in pursuing this further, I'll work on improving the metric to be closer to the actual memory being used, while keeping the calculation simple and fast. I'll also include some unit tests for it.

Regarding the setting being marked experimental, can you give me some pointer on where that would be done? Is it just a matter of saying so in SysProperties?

@lppassos lppassos marked this pull request as draft May 18, 2021 07:33
@manticore-projects
Copy link
Contributor

manticore-projects commented May 18, 2021

Apologies for asking a maybe dumb question, I am just a curious visitor: would it not be a better idea to define per Connection Setting, whether to use a Memory Based Map ("regular systems") or a Disk based HashMap ("memory constraint embedded systems") for the Grouped By Entries? Something like https://github.com/bnyeggen/lash or maybe https://github.com/OpenHFT/Chronicle-Map, if MVSore was not fast enough?

I found this one: https://github.com/lmdbjava/benchmarks/blob/master/results/20160630/README.md
It is quite dated, but at least in 2016 MVStore was not the fastest solution and caching/backing large map to the disk might need that fastest speed possible?

@katzyn
Copy link
Contributor

katzyn commented May 18, 2021

Is it just a matter of saying so in SysProperties?

Yes, it should be enough to mention the experimental status in the Javadoc. Most likely this setting will be removed or replaced with some other setting when better implementation of storage arrives.

@manticore-projects
Groups and window partitions aren't just values, their data structure is more complicated and cannot be persisted directly, because it can contain various objects (including third-party objects) and they aren't required to be serializable. The whole this class is a subject to change, it was written as a first draft implementation of storage for both groups and window partitions and it was based on archaic storage for groups that had the same problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants