New implementation of HeightTree #371

dgrunwald · 2022-09-04T17:19:34Z

The HeightTree is the data structure that maps between document line numbers and visual Y positions. It handles both the varying heights of lines (e.g. due to word-wrapping) and the collapsing of lines (e.g. due to folding).

Where the old implementation was based on a red-black tree, the new implementation is inspired by B+ trees.
We no longer need a separate memory allocation for every line in the document. This should dramatically reduce the memory consumption: a comment in the old code indicated that each line took 56 bytes of memory.
But that was in 32-bit days; in a 64-bit process the old implementation actually needed 88 bytes per line.

The new implementation should need 168 bytes per leaf node which holds 8-16 lines, i.e. we need 10.5-21 bytes per line for the leaf nodes.
It additionally needs 456 bytes per inner node (for the inner node and its children array). The level just above the leafs holds 64-256 lines per inner node, i.e. we need 1.8-7.2 bytes per line for the second level.
Thus the total memory consumption of the new tree should in the worst case still be below 29.2 bytes per line (168/8 + sum(456/(8**lvl) for lvl in range(2,100))).
However after freshly loading a large document from disk we should be close to the best case which in total is just 12.4 bytes per line (168/16 + sum(456/(16**lvl) for lvl in range(2,100))).

Apart from the reduced memory usage, the behavior of the new tree should be identical to the old behavior.
In particular, all operations (line insertion, line deletion, collapsing, uncollapsing, line<->Y queries) maintain their externally-visible behavior and their O(lg N) run-time.
set_DefaultLineHeight runs in O(N) now (previously could take O(N lg N)).

The HeightTree is the data structure that maps between document line numbers and visual Y positions. It handles both the varying heights of lines (e.g. due to word-wrapping) and the collapsing of lines (e.g. due to folding). Where the old implementation was based on a red-black tree, the new implementation is inspired by B+ trees. We no longer need a separate memory allocation for every line in the document. This should dramatically reduce the memory consumption: a comment in the old code indicated that each line took 56 bytes of memory. But that was in 32-bit days; in a 64-bit process the old implementation actually needed 88 bytes per line. The new implementation should need 168 bytes per leaf node which holds 8-16 lines, i.e. we need 10.5-21 bytes per line for the leaf nodes. It additionally needs 456 bytes per inner node (for the inner node and its children array). The level just above the leafs holds 64-256 lines per inner node, i.e. we need 1.8-7.2 bytes per line for the second level. Thus the total memory consumption of the new tree should in the worst case still be below 29.2 bytes per line (`168/8 + sum(456/(8**lvl) for lvl in range(2,100))`). However after freshly loading a large document from disk we should be close to the best case which in total is just 12.4 bytes per line (`168/16 + sum(456/(16**lvl) for lvl in range(2,100))`). Apart from the reduced memory usage, the behavior of the new tree should be identical to the old behavior. In particular, all operations (line insertion, line deletion, collapsing, uncollapsing, line<->Y queries) maintain their externally-visible behavior and their O(lg N) run-time. set_DefaultLineHeight runs in O(N) now (previously could take O(N lg N)).

sopgenorth · 2023-01-03T01:45:14Z

I saw the comment in mainline about changing the implementation, and was curious what an improvement might look like, particularly from my other interests in trying to add some sort of lazy file loading mechanism (just a dream right now, see: #382 )

On a 750MB test file this change reduces RAM gives a healthy reduction from ~7.5GB to ~5GB.

Nice work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New implementation of HeightTree #371

New implementation of HeightTree #371

dgrunwald commented Sep 4, 2022

sopgenorth commented Jan 3, 2023

New implementation of HeightTree #371

Are you sure you want to change the base?

New implementation of HeightTree #371

Conversation

dgrunwald commented Sep 4, 2022

sopgenorth commented Jan 3, 2023