Cache repeated COLR subtables (2) #3221

anthrotype · 2023-07-21T15:05:57Z

Some improvements to #3215 to use pickle (faster than json), sha256 for the digest to mitigate collisions, and a LRU cache to avoid it growing unbounded.
I'm still unsure regarding the cache max size (I used 128 like lru_cache but don't know what a good number is in general for this task).
I'm also hesitant to enable this unconditionally and would rather it be opt in, as it's not guaranteed to lead to faster builds given the overhead of pickling+hashing objects mutliple times (the TableBuilder.build is called recursively for each struct fields in a table tree).

… cache

otherwise we'd confuse two objects with distinct classes that happen to have the same 'source'

maybe to choose the default cache size we need more experiments with real-world fonts

simoncozens · 2023-08-09T08:08:40Z

Using paintcompiler on Bitcount:

FontTools main: 2:19.07 (peak memory usage 4.8G)
cache-color-builds: 54.015 (peak memory usage 236M)
cache-color-builds-2: 2:41.66 (peak memory usage 5G)

Probably better to do nothing than to do this.

anthrotype · 2023-08-09T09:49:54Z

well.. that's strange, how come your branch where we don't manage the cache at all and let it grow unbounded actually reports using less memory? How did you measure?

anthrotype · 2023-08-09T09:51:53Z

also note that #3215 has a bug which I fixed in 259e013, the cache key must also contain the class itself otherwise two objects with distinct classes but same data can be confused

simoncozens · 2023-08-09T12:31:08Z

Don't know. The TTX is different between the master build and my build, but visually it's the same. I'm measuring the memory footprint with "top", which I know isn't completely accurate, but it's not going to be an order of magnitude wrong either.

anthrotype · 2023-08-09T12:34:34Z

if the ttx is different it must be a bug, because this optimization is only meant to speed things up but the end result ought to be identical

behdad · 2023-08-09T15:51:03Z

Lib/fontTools/colorLib/table_builder.py

+                # Move to the end to mark it as most recently used
+                self.cache.move_to_end(cacheKey)
+                # return unique copies; pickle/unpickle is faster than copy.deepcopy
+                return pickle.loads(pickle.dumps(self.cache[cacheKey]))


This is ugly, even if faster than deepcopy. :)

Also, you might get away without a copy whatsoever, no?

Maybe as an opt-in.

This is ugly, even if faster than deepcopy. :)

the whole point of this was to speed things up...

you might get away without a copy whatsoever, no?

I don't think so. If you use this to build master COLR tables that are then sent to varLib.merger to build variable COLR, the thing is updated in-place and you don't want shared objects. When decompiling otTables from binary, you do get distinct objects even when they were pointed to by same offset.

simoncozens and others added 7 commits July 20, 2023 14:00

Cache repeated subtables while compiling colour fonts

ec96181

Skip things which can't be cached

32dff74

Store hash instead of storing full JSON

6ff8f34

colorLib: use sha256 hash and pickle instead of json for TableBuilder…

235e89a

… cache

table_builder: use pickle+unpickle, faster than deepcopy

bb1389e

table_builder: cache key must include class as well as source

259e013

otherwise we'd confuse two objects with distinct classes that happen to have the same 'source'

table_builder: use a LRU cache to avoid growing unbounded

9d04460

maybe to choose the default cache size we need more experiments with real-world fonts

anthrotype requested a review from simoncozens July 21, 2023 15:05

simoncozens mentioned this pull request Jul 24, 2023

Cache repeated subtables while building COLR #3215

Closed

anthrotype changed the base branch from cache-colr-builds to main July 24, 2023 08:31

remove unused import copy

84c0617

behdad reviewed Aug 9, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache repeated COLR subtables (2) #3221

Cache repeated COLR subtables (2) #3221

anthrotype commented Jul 21, 2023

simoncozens commented Aug 9, 2023

anthrotype commented Aug 9, 2023

anthrotype commented Aug 9, 2023

simoncozens commented Aug 9, 2023

anthrotype commented Aug 9, 2023

behdad Aug 9, 2023

behdad Aug 9, 2023

behdad Aug 9, 2023

anthrotype Aug 9, 2023 •

edited

Cache repeated COLR subtables (2) #3221

Are you sure you want to change the base?

Cache repeated COLR subtables (2) #3221

Conversation

anthrotype commented Jul 21, 2023

simoncozens commented Aug 9, 2023

anthrotype commented Aug 9, 2023

anthrotype commented Aug 9, 2023

simoncozens commented Aug 9, 2023

anthrotype commented Aug 9, 2023

behdad Aug 9, 2023

Choose a reason for hiding this comment

behdad Aug 9, 2023

Choose a reason for hiding this comment

behdad Aug 9, 2023

Choose a reason for hiding this comment

anthrotype Aug 9, 2023 • edited

Choose a reason for hiding this comment

anthrotype Aug 9, 2023 •

edited