Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) #777

maartenbreddels · 2022-09-14T13:11:47Z

This makes creating ipywidgets about 2x faster (ignoring comm construction).

Related to, but superseded #639

maartenbreddels · 2022-09-14T18:58:23Z

The ipywidget tests fail, since it adds _comm_default at runtime, jupyter-widgets/ipywidgets#3590 fixes that and is also a better method anyway. The question is, is this a breaking change? I don't think there is any mention of adding _trait_default at runtime, and it causes a significant runtime cost.

blink1073 · 2022-09-15T01:26:23Z

I think it is fair to make this change, assuming jupyter-widgets/ipywidgets#3590 gets merged and released and we go green.

maartenbreddels · 2022-09-26T12:37:14Z

I have another commit that gives a good performance win in a situation where a lot of default are 'static' (like a fixed default value for string and ints). This happens a lot for ipywidgets.

Should I push to this branch? or should I wait to get this in first (I could open a new PR and later rebase against main)

blink1073 · 2022-09-26T12:43:46Z

I'd say push here

maartenbreddels · 2022-09-26T12:52:06Z

I've been benchmarking this with pytest-benchmark using this script:

from traitlets import HasTraits, Int, Unicode

class SingleTrait(HasTraits):
    foo = Int()


class ManyTrait(HasTraits):
    foo0 = Unicode("foo0").tag(sync=True)
    foo1 = Unicode("foo1").tag(sync=True)
    foo2 = Unicode("foo2").tag(sync=True)
    foo3 = Unicode("foo3").tag(sync=True)
    foo4 = Unicode("foo4").tag(sync=True)
    foo5 = Unicode("foo5").tag(sync=True)
    foo6 = Unicode("foo6").tag(sync=True)
    foo7 = Unicode("foo7").tag(sync=True)
    foo8 = Unicode("foo8").tag(sync=True)
    foo9 = Unicode("foo9").tag(sync=True)



def create(N, cls):
    for i in range(N):
        cls()


def create_and_access(N, cls):
    keys = cls.class_trait_names(sync=True)
    for i in range(N):
        obj = cls()
        for key in keys:
            getattr(obj, key)


def test_single_trait(benchmark):
    benchmark(create, cls=SingleTrait, N=1_000)

def test_many_trait(benchmark):
    benchmark(create, cls=ManyTrait, N=1_000)

def test_single_trait_and_access(benchmark):
    benchmark(create_and_access, cls=SingleTrait, N=1_000)

def test_many_trait_and_access(benchmark):
    benchmark(create_and_access, cls=ManyTrait, N=1_000)

Results

Main `a35dd0b`

-------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------
Name (time in ms)                    Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                12.4555 (1.0)      12.9838 (1.0)      12.5927 (1.0)      0.1114 (1.62)     12.5714 (1.0)      0.1253 (2.96)         14;2  79.4108 (1.0)          71           1
test_single_trait_and_access     13.0052 (1.04)     13.5203 (1.04)     13.1818 (1.05)     0.1235 (1.80)     13.1553 (1.05)     0.1789 (4.23)         21;0  75.8624 (0.96)         75           1
test_many_trait                  14.0976 (1.13)     14.4672 (1.11)     14.3162 (1.14)     0.0686 (1.0)      14.3159 (1.14)     0.0423 (1.0)         19;16  69.8510 (0.88)         69           1
test_many_trait_and_access       57.5647 (4.62)     59.3905 (4.57)     58.3111 (4.63)     0.5214 (7.60)     58.1785 (4.63)     0.8110 (19.18)         5;0  17.1494 (0.22)         18           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

before the last commit `166cef4`

--------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------
Name (time in ms)                    Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                 2.2666 (1.0)       2.5554 (1.0)       2.3552 (1.0)      0.0274 (1.63)      2.3493 (1.0)      0.0152 (1.06)        50;47  424.5879 (1.0)         400           1
test_single_trait_and_access      2.4376 (1.08)      2.6091 (1.02)      2.5242 (1.07)     0.0169 (1.0)       2.5228 (1.07)     0.0151 (1.06)        54;24  396.1647 (0.93)        394           1
test_many_trait                   2.6998 (1.19)      2.9632 (1.16)      2.8028 (1.19)     0.0299 (1.77)      2.7969 (1.19)     0.0143 (1.0)         47;50  356.7877 (0.84)        367           1
test_many_trait_and_access       27.8987 (12.31)    29.0280 (11.36)    28.0583 (11.91)    0.2163 (12.83)    27.9914 (11.91)    0.0954 (6.67)          4;4   35.6400 (0.08)         36           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

`ae8c6cd`

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.4483 (1.0)      2.7051 (1.0)      2.5550 (1.0)      0.0328 (1.13)     2.5482 (1.0)      0.0225 (1.44)        59;49  391.3851 (1.0)         370           1
test_single_trait_and_access     2.6786 (1.09)     2.9397 (1.09)     2.7569 (1.08)     0.0308 (1.06)     2.7498 (1.08)     0.0218 (1.39)        52;41  362.7242 (0.93)        362           1
test_many_trait                  3.3650 (1.37)     3.7456 (1.38)     3.4842 (1.36)     0.0291 (1.0)      3.4812 (1.37)     0.0156 (1.0)         22;22  287.0108 (0.73)        278           1
test_many_trait_and_access       5.3269 (2.18)     5.9227 (2.19)     5.5025 (2.15)     0.0671 (2.31)     5.4879 (2.15)     0.0230 (1.47)        13;19  181.7341 (0.46)        170           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Looking at the median, we see a >10x improvement for HasTrait with many 'static'+immutable data, which is quite common in ipywidgets (see e.g. the Layout widget https://github.com/jupyter-widgets/ipywidgets/blob/master/python/ipywidgets/ipywidgets/widgets/widget_layout.py )

where the last commit really helps with a factor of ~5x

maartenbreddels · 2022-09-26T13:03:15Z

I noticed that ae8c6cd slowed down the other cases because we now did some unneeded checking of presence in a dict, so I fixed that.

Now ae8c6cd only gives a speedup compared to 166cef4:

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.2957 (1.0)      2.5338 (1.0)      2.3469 (1.0)      0.0437 (1.0)      2.3307 (1.0)      0.0493 (1.0)         83;20  426.0973 (1.0)         403           1
test_single_trait_and_access     2.3051 (1.00)     6.7410 (2.66)     2.4294 (1.04)     0.2315 (5.30)     2.3947 (1.03)     0.1165 (2.36)          7;7  411.6245 (0.97)        425           1
test_many_trait                  2.7801 (1.21)     3.3918 (1.34)     2.8540 (1.22)     0.0860 (1.97)     2.8196 (1.21)     0.0813 (1.65)        40;23  350.3821 (0.82)        348           1
test_many_trait_and_access       4.8385 (2.11)     5.3911 (2.13)     5.0037 (2.13)     0.1150 (2.63)     4.9796 (2.14)     0.1399 (2.84)         53;6  199.8502 (0.47)        187           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

blink1073 · 2022-09-26T14:18:22Z

Nice work!

maartenbreddels · 2022-09-28T08:29:17Z

Found another performance issue: We are calling Descriptor.instance_init for each HasTraits we create, for each descriptor, while actually many of them don't do anything.
I've refactored it such that only at class construction we collect all instance_init methods that do something. We by default assume that we "do something", by adding instance_init to a list of functions to be called.
However, many of the TypeTraits opt out of this by overriding subclass_init to not add it's own instance_init to the list of functions.
This makes it almost perfectly backwards compatible, except...
When someone inherits from say Int, its instance_init will not be called. We could make it fully compatible by checking if type(self) == Int (and similar for the other traits). Let me know, I'm happy to add this in.

When creating a 1000 HasTraits with 30 traits that do not need an instance_init, we avoid 30000 unneeded function calls.

Performance benchmark:

Before

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.2703 (1.0)      2.5766 (1.0)      2.3246 (1.0)      0.0510 (1.50)     2.3051 (1.0)      0.0656 (2.79)        65;13  430.1731 (1.0)         407           1
test_single_trait_and_access     2.2841 (1.01)     2.7480 (1.07)     2.3585 (1.01)     0.0789 (2.31)     2.3257 (1.01)     0.0752 (3.20)        62;36  423.9969 (0.99)        430           1
test_many_trait                  2.8105 (1.24)     3.0434 (1.18)     2.8452 (1.22)     0.0341 (1.0)      2.8343 (1.23)     0.0235 (1.0)         36;30  351.4728 (0.82)        287           1
test_many_trait_and_access       4.6244 (2.04)     4.9061 (1.90)     4.6736 (2.01)     0.0457 (1.34)     4.6592 (2.02)     0.0415 (1.77)        24;15  213.9672 (0.50)        213           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.2464 (1.0)      2.4565 (1.01)     2.2825 (1.00)     0.0252 (1.0)      2.2769 (1.00)     0.0230 (1.32)        58;22  438.1106 (1.00)        422           1
test_many_trait                  2.2525 (1.00)     2.4332 (1.00)     2.2818 (1.0)      0.0257 (1.02)     2.2746 (1.0)      0.0223 (1.28)        55;32  438.2443 (1.0)         436           1
test_single_trait_and_access     2.2594 (1.01)     2.4330 (1.0)      2.2858 (1.00)     0.0254 (1.01)     2.2781 (1.00)     0.0174 (1.0)         42;36  437.4761 (1.00)        427           1
test_many_trait_and_access       4.1723 (1.86)     4.4918 (1.85)     4.2298 (1.85)     0.0608 (2.42)     4.2072 (1.85)     0.0509 (2.92)        33;26  236.4174 (0.54)        230           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This saves another ~10% for HasTraits with 10 traits. I've also tested this in a more extreme case with 100 traits, which gives a 25% performance increase. This shows this approach scales much better.

Looking at creating a 100 trait object in viztracer, we see:

before

setup_instance takes most of the time.

after

So instead of setup_instance taking most of the time creating an object, we only spend time doing a dict copy with the trait values (the only overhead that we cannot avoid). Now the last bottle neck seems to be the HasTraits constructor (more specifically the context manager)

maartenbreddels · 2022-09-28T09:07:02Z

After also working on the HasTrait constructor, where we remove a lot of overhead from the hold_trait_notifications context manager.

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                1.3711 (1.0)      1.5112 (1.0)      1.3890 (1.0)      0.0176 (1.03)     1.3829 (1.0)      0.0121 (1.0)         77;69  719.9187 (1.0)         682           1
test_single_trait_and_access     1.3869 (1.01)     1.5423 (1.02)     1.4141 (1.02)     0.0171 (1.0)      1.4105 (1.02)     0.0171 (1.42)       144;30  707.1417 (0.98)        699           1
test_many_trait                  1.4594 (1.06)     1.6393 (1.08)     1.4794 (1.07)     0.0194 (1.13)     1.4736 (1.07)     0.0154 (1.27)        79;50  675.9557 (0.94)        662           1
test_many_trait_and_access       3.2931 (2.40)     3.5277 (2.33)     3.3287 (2.40)     0.0398 (2.32)     3.3156 (2.40)     0.0267 (2.21)        36;35  300.4134 (0.42)        293           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

We see another ~25% performance increase.

maartenbreddels · 2022-09-28T09:49:49Z

I think I don't see any overhead at this moment anymore at constructing HasTraits that are obvious and don't break things, I think this is as fast as I can push it.

Possible future optimization might be in removing magic methods, because that would save us in checking with hasattr at runtime. Also, the Bunch construction gives some overhead, but I don't see how we can get rid of that.

blink1073 · 2022-09-28T12:52:15Z

Ha, I love seeing the PR title update. Thanks again for digging in to this!

naterush · 2022-10-20T06:33:20Z

Per Maarten's ideas, I further dug into performance here. Some quick observations:

Overhead from the Bunch class

First, it's worth noting that the Bunch class adds some serious overhead to the dict class, if you use the bunch.attribute syntax to access values. With this simple test case:

def test_bunch_named_attribute_access(benchmark):
    def do_named_access():
        b = Bunch(x=5, y=10)
        for _ in range(1_000_000):
            new = b.x

    benchmark(do_named_access)

def test_bunch_dict_access(benchmark):
    def do_key_access():
        b = Bunch(x=5, y=10)
        for _ in range(1_000_000):
            new = b['x']

    benchmark(do_key_access)

This has the following results:

------------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------------
Name (time in ms)                          Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bunch_dict_access                 31.1114 (1.0)       31.3130 (1.0)       31.2269 (1.0)      0.0496 (1.0)       31.2313 (1.0)      0.0681 (1.0)           9;0  32.0236 (1.0)          32           1
test_bunch_named_attribute_access     233.6903 (7.51)     234.9689 (7.50)     234.2604 (7.50)     0.5379 (10.84)    234.3409 (7.50)     0.9007 (13.23)         2;0   4.2688 (0.13)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So, using the bunch.x is about 7.5x slower than using bunch['x'].

Overhead comes from the get_attr function in bunch

This overhead is clearly from the get_attr function, which is overwritten to call the .getitem function on the underlying dict.

You can get some minor performance improvements by setting __getitem__ = dict.get, (you go to 5x slower), but even then there's a lot of performance left on the table with bunch using the named attribute access.

The other problem here is that user code that has event observers likely expects the AttributeErrors to be turned to KeyErrors, as currently happens in the getitem function. This performance PR shouldn't change the external interface users mess about with -- but explored below are ways to help minimize the impact of bunch performance on tartlets internally.

Using the `bunch.attr` syntax adds significant overhead to event processing

The Bunch is primarily used internally within traitlets as the return value for the observe functions when watching change events.

Consider the following example where we're watching many traits:

from typing import Any, Dict
from traitlets import HasTraits, Integer, observe, Unicode, All

class ExampleHasTraits(HasTraits):
    foo: int = Integer(2, help="a number").tag(config=True) # type:ignore

    @observe('num')
    def _num_changed(self, change):
        print("{name} changed from {old} to {new}".format(**change))
    

class ManyTrait(HasTraits):
    foo0 = Unicode("foo0").tag(sync=True)
    foo1 = Unicode("foo1").tag(sync=True)
    foo2 = Unicode("foo2").tag(sync=True)
    foo3 = Unicode("foo3").tag(sync=True)
    foo4 = Unicode("foo4").tag(sync=True)
    foo5 = Unicode("foo5").tag(sync=True)
    foo6 = Unicode("foo6").tag(sync=True)
    foo7 = Unicode("foo7").tag(sync=True)
    foo8 = Unicode("foo8").tag(sync=True)
    foo9 = Unicode("foo9").tag(sync=True)
    foo10 = Unicode("foo10").tag(sync=True)
    foo11 = Unicode("foo11").tag(sync=True)
    foo12 = Unicode("foo12").tag(sync=True)
    foo13 = Unicode("foo13").tag(sync=True)
    foo14 = Unicode("foo14").tag(sync=True)
    foo15 = Unicode("foo15").tag(sync=True)
    foo16 = Unicode("foo16").tag(sync=True)
    foo17 = Unicode("foo17").tag(sync=True)
    foo18 = Unicode("foo18").tag(sync=True)
    foo19 = Unicode("foo19").tag(sync=True)
    foo20 = Unicode("foo20").tag(sync=True)

    @observe(All)
    def _num_changed(self, change):
        var = change['old']


def create_and_set(N, cls):
    keys = cls.class_trait_names(sync=True)
    for i in range(N):
        obj = cls()
        for key in keys:
            setattr(obj, key, "123")

import cProfile

# First, we run with 1k
with cProfile.Profile() as pr:
    create_and_set(N=10_000, cls=ManyTrait)

import cProfile, pstats
from pstats import SortKey
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr).sort_stats(sortby)
ps.print_stats()

If we simply remove any usage of named access on the traitlets within the library, we get the following performance change:

With named access:

          6630113 function calls in 1.272 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   210000    0.452    0.000    0.750    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1509(_notify_observers)
   210000    0.098    0.000    1.157    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:702(set)
   630000    0.083    0.000    0.109    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/utils/bunch.py:13(__getattr__)
   210000    0.079    0.000    0.199    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:731(_validate)
   210000    0.075    0.000    0.859    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1494(_notify_trait)
   630000    0.070    0.000    0.070    0.000 {built-in method builtins.hasattr}
   210000    0.049    0.000    0.076    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:740(_cross_validate)
   850000    0.037    0.000    0.037    0.000 {built-in method builtins.isinstance}
   210000    0.037    0.000    1.193    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:720(__set__)
   840021    0.036    0.000    0.036    0.000 {method 'get' of 'dict' objects}
   210000    0.034    0.000    0.784    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1505(notify_change)
   210000    0.032    0.000    0.032    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1227(__get__)
   210000    0.029    0.000    0.060    0.000 {built-in method builtins.getattr}
   210000    0.027    0.000    1.221    0.000 {built-in method builtins.setattr}
   630000    0.026    0.000    0.026    0.000 {method '__getitem__' of 'dict' objects}
   210000    0.021    0.000    0.029    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:2517(validate)
   420000    0.020    0.000    0.020    0.000 {method 'extend' of 'list' objects}
        1    0.020    0.020    1.272    1.272 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:40(create_and_set)
   210000    0.017    0.000    0.017    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:35(_num_changed)
    10000    0.006    0.000    0.023    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1305(setup_instance)
    10000    0.005    0.000    0.029    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1262(__new__)
    10000    0.003    0.000    0.009    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1618(observe)
    10000    0.003    0.000    0.015    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1277(setup_instance)
    10000    0.003    0.000    0.004    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1556(_add_notifiers)
    10000    0.003    0.000    0.012    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1238(instance_init)
    10000    0.003    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1319(__init__)
    10000    0.002    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:237(parse_notifier_name)
    10001    0.001    0.000    0.001    0.000 {method 'copy' of 'dict' objects}
    10000    0.001    0.000    0.001    0.000 {built-in method __new__ of type object at 0x103b37288}
    10000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1743(class_traits)
        1    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/cProfile.py:117(__exit__)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:263(__init__)
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1734(class_trait_names)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:266(__call__)
       22    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
       21    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}

With no named access:

         5370113 function calls in 0.976 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   210000    0.294    0.000    0.475    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1509(_notify_observers)
   210000    0.094    0.000    0.866    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:702(set)
   210000    0.076    0.000    0.194    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:731(_validate)
   210000    0.071    0.000    0.578    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1494(_notify_trait)
   630000    0.068    0.000    0.068    0.000 {built-in method builtins.hasattr}
   210000    0.049    0.000    0.076    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:740(_cross_validate)
   850000    0.035    0.000    0.035    0.000 {built-in method builtins.isinstance}
   210000    0.034    0.000    0.900    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:720(__set__)
   840021    0.034    0.000    0.034    0.000 {method 'get' of 'dict' objects}
   210000    0.031    0.000    0.507    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1505(notify_change)
   210000    0.030    0.000    0.030    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1227(__get__)
   210000    0.029    0.000    0.059    0.000 {built-in method builtins.getattr}
   210000    0.026    0.000    0.926    0.000 {built-in method builtins.setattr}
   210000    0.021    0.000    0.027    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:2517(validate)
   420000    0.019    0.000    0.019    0.000 {method 'extend' of 'list' objects}
        1    0.019    0.019    0.976    0.976 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:40(create_and_set)
   210000    0.016    0.000    0.016    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:35(_num_changed)
    10000    0.006    0.000    0.022    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1305(setup_instance)
    10000    0.005    0.000    0.028    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1262(__new__)
    10000    0.003    0.000    0.009    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1618(observe)
    10000    0.003    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1556(_add_notifiers)
    10000    0.003    0.000    0.015    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1277(setup_instance)
    10000    0.003    0.000    0.012    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1238(instance_init)
    10000    0.002    0.000    0.002    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1319(__init__)
    10000    0.002    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:237(parse_notifier_name)
    10001    0.001    0.000    0.001    0.000 {method 'copy' of 'dict' objects}
    10000    0.001    0.000    0.001    0.000 {built-in method __new__ of type object at 0x105a1f288}
    10000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1743(class_traits)
        1    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/cProfile.py:117(__exit__)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:263(__init__)
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1734(class_trait_names)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:266(__call__)
       21    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
       22    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}

If you actually line profile the benchmark observers function, you see that about 25% of the time in this function is just doing bunch accesses:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1509                                               @profile
  1510                                               def _notify_observers(self, event):
  1511                                                   """Notify observers of any event"""
  1512    210000      67280.0      0.3      3.8          if not isinstance(event, Bunch):
  1513                                                       # cast to bunch if given a dict
  1514                                                       event = Bunch(event)
  1515    210000     256428.0      1.2     14.3          name, type = event.name, event.type
  1516                                           
  1517    210000      62783.0      0.3      3.5          callables = []
  1518    210000      72407.0      0.3      4.0          if name in self._trait_notifiers:
  1519                                                       callables.extend(self._trait_notifiers.get(name, {}).get(type, []))
  1520                                                       callables.extend(self._trait_notifiers.get(name, {}).get(All, []))
  1521    210000      71647.0      0.3      4.0          if All in self._trait_notifiers:  # type:ignore[comparison-overlap]
  1522    420000     133094.0      0.3      7.4              callables.extend(
  1523    210000      92931.0      0.4      5.2                  self._trait_notifiers.get(All, {}).get(type, [])  # type:ignore[call-overload]
  1524                                                       )
  1525    420000     130307.0      0.3      7.3              callables.extend(
  1526    210000      91059.0      0.4      5.1                  self._trait_notifiers.get(All, {}).get(All, [])  # type:ignore[call-overload]
  1527                                                       )
  1528                                           
  1529                                                   # Now static ones
  1530    210000      80596.0      0.4      4.5          magic_name = "_%s_changed" % name
  1531    210000     185720.0      0.9     10.4          if event.type == "change" and hasattr(self, magic_name):
  1532                                                       class_value = getattr(self.__class__, magic_name)
  1533                                                       if not isinstance(class_value, ObserveHandler):
  1534                                                           _deprecated_method(
  1535                                                               class_value,
  1536                                                               self.__class__,
  1537                                                               magic_name,
  1538                                                               "use @observe and @unobserve instead.",
  1539                                                           )
  1540                                                           cb = getattr(self, magic_name)
  1541                                                           # Only append the magic method if it was not manually registered
  1542                                                           if cb not in callables:
  1543                                                               callables.append(_callback_wrapper(cb))
  1544                                           
  1545                                                   # Call them all now
  1546                                                   # Traits catches and logs errors here.  I allow them to raise
  1547    420000     131027.0      0.3      7.3          for c in callables:
  1548                                                       # Bound methods have an additional 'self' argument.
  1549                                           
  1550    210000      76219.0      0.4      4.3              if isinstance(c, _CallbackWrapper):
  1551                                                           c = c.__call__
  1552    210000      78564.0      0.4      4.4              elif isinstance(c, EventHandler) and c.name is not None:
  1553    210000     152623.0      0.7      8.5                  c = getattr(self, c.name)
  1554                                           
  1555    210000     107464.0      0.5      6.0              c(event)

If we update to not used named access, line 1515 goes from 14% of runtime to 5% of runtime, and 1531 goes from 10% of runtime to 6.6% of runtime, dropping the runtime of this function by 15% with no functionality changes anywhere to the lib.

Also note that per line profiling also tells us that bunch creation is a pretty minimal process overall, with <3% of runtime being spent here. Optimizing these bunch creations doesn't help anywhere near as much as just using them better.

naterush · 2022-10-20T06:36:16Z

I'll open a PR today against this PR that makes these changes (about 6 characters being changed total), as for traitlets that have their values set frequently, this gives a 20% performance benefit for effectively no change.

Based on the above logs, there aren't any more super obvious optimization targets when it comes to set events + change observers. Are there any other common workflows with traitlets that would be worth profiling to see if there are easy wins?

rmorshea · 2022-10-20T07:09:52Z

Not sure if this was already used, but it might be worth running perflint to see if there are any other micro-improvements that can be made (balancing for readability of course).

maartenbreddels · 2022-12-07T21:20:31Z

Good news: after the ipywidgets 8.0.3 release, those tests pass with this branch.
Bad news: apparent CI rot, I think we're picking up the latest mypy which is more strict.
I think that the last thing needs to be fixed in main.

maartenbreddels · 2022-12-08T15:22:08Z

@blink1073 do you understand the jupyter_server issue?

blink1073 · 2022-12-08T15:37:55Z

Server 2.0.1 has a fix for that, I kicked the build to see if we pick it up.

blink1073 · 2022-12-08T15:39:18Z

Nope, it must be taking a while to propagate.

blink1073

Nice work, thank you!

maartenbreddels · 2022-12-08T21:09:07Z

🥳
@naterush do you want to open your PR? To follow up

jeffyjefflabs · 2022-12-09T19:27:03Z

We are using sphinx 4.5.0 with myst-nb 0.17.1 and are now getting AttributeError: __provides__ at traitlets.py:1009 after upgrading from traitlets 5.6 to 5.7. It looks like it may be related to this PR based on this comment?

maartenbreddels · 2022-12-09T20:20:06Z

Thanks for letting us know, how can I trigger this error? So I can test it locally

maartenbreddels · 2022-12-09T20:21:10Z

Thanks for letting us know, how can I trigger this error? So I can test it locally

Regression, noticed in ipython#777 (comment)

Regression, noticed in #777 (comment)

maartenbreddels mentioned this pull request Sep 14, 2022

perf: fast path for simple default construction #639

Closed

blink1073 added the enhancement label Sep 14, 2022

maartenbreddels force-pushed the perf_construction branch from 63f30eb to ef4d638 Compare September 14, 2022 18:08

maartenbreddels mentioned this pull request Sep 14, 2022

fix: using _comm_default shadows possible bugs jupyter-widgets/ipywidgets#3590

Merged

maartenbreddels force-pushed the perf_construction branch from 7262a70 to a3862e0 Compare September 14, 2022 19:17

minrk approved these changes Sep 26, 2022

View reviewed changes

maartenbreddels changed the title ~~Performance: decrease runtime overhead for constructing HasTraits (almost 2x faster)~~ Performance: decrease runtime overhead for constructing HasTraits (up to 10x faster) Sep 27, 2022

maartenbreddels changed the title ~~Performance: decrease runtime overhead for constructing HasTraits (up to 10x faster)~~ Performance: decrease runtime overhead for constructing HasTraits (up to 14x faster) Sep 28, 2022

maartenbreddels changed the title ~~Performance: decrease runtime overhead for constructing HasTraits (up to 14x faster)~~ Performance: decrease runtime overhead for constructing HasTraits (up to 17x faster) Sep 28, 2022

maartenbreddels changed the title ~~Performance: decrease runtime overhead for constructing HasTraits (up to 17x faster)~~ Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) Sep 28, 2022

maartenbreddels force-pushed the perf_construction branch from f491db4 to 521876d Compare September 28, 2022 09:46

maartenbreddels force-pushed the perf_construction branch from 2b87079 to 589c1c1 Compare September 29, 2022 09:50

maartenbreddels mentioned this pull request Sep 29, 2022

Perf: various optimizations jupyter-widgets/ipywidgets#3602

Open

naterush mentioned this pull request Oct 21, 2022

perf: remove usage of named attribute accesses on bunches maartenbreddels/traitlets#1

Open

maartenbreddels force-pushed the perf_construction branch from 589c1c1 to 4324bbb Compare December 7, 2022 16:19

maartenbreddels added 12 commits December 8, 2022 16:18

typing

304e888

perf: avoid dynamic default code path for static immutable defaults

5dd9d70

typing

a72261d

perf: avoid unnecessary dict presence checks

a7c22ca

more trivial immutable static initial values

05d3023

perf: better scaling due to skipping instance_init when possible.

aff9d0c

perf: specialized version of hold_trait_notifications in ctor

d0ec5ff

perf: skip setup if no kwargs are given

bf33e88

fix: validate return value should be used

8e04ed9

flake8: unused variable

5bb5924

perf: opt out instance_init for bool and enum

09ca954

perf: all instances that are None are immutable and static

1c85d41

maartenbreddels force-pushed the perf_construction branch from 4324bbb to 1c85d41 Compare December 8, 2022 15:18

perf: remove usage of named attribute accesses on bunches

dae860f

maartenbreddels added 2 commits December 8, 2022 18:06

trigger ci

1cc97d4

trigger ci

36fdaf5

blink1073 approved these changes Dec 8, 2022

View reviewed changes

blink1073 merged commit f07afea into ipython:main Dec 8, 2022

maartenbreddels added a commit to maartenbreddels/traitlets that referenced this pull request Dec 9, 2022

fix: some descriptors raise AttributeError

13c26b7

Regression, noticed in ipython#777 (comment)

maartenbreddels mentioned this pull request Dec 9, 2022

fix: some descriptors raise AttributeError #812

Merged

blink1073 pushed a commit that referenced this pull request Dec 12, 2022

fix: some descriptors raise AttributeError (#812)

aa0d38b

Regression, noticed in #777 (comment)

vidartf mentioned this pull request Feb 17, 2023

instance_init of Union subclass not being called #833

Closed

danielhollas mentioned this pull request Jul 13, 2023

Bump traitlets for better performance aiidalab/aiidalab-docker-stack#401

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) #777

Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) #777

maartenbreddels commented Sep 14, 2022

maartenbreddels commented Sep 14, 2022

blink1073 commented Sep 15, 2022

maartenbreddels commented Sep 26, 2022

blink1073 commented Sep 26, 2022

maartenbreddels commented Sep 26, 2022 •

edited

maartenbreddels commented Sep 26, 2022

blink1073 commented Sep 26, 2022

maartenbreddels commented Sep 28, 2022

maartenbreddels commented Sep 28, 2022

maartenbreddels commented Sep 28, 2022

blink1073 commented Sep 28, 2022

naterush commented Oct 20, 2022

naterush commented Oct 20, 2022

rmorshea commented Oct 20, 2022

maartenbreddels commented Dec 7, 2022

maartenbreddels commented Dec 8, 2022

blink1073 commented Dec 8, 2022

blink1073 commented Dec 8, 2022

blink1073 left a comment

maartenbreddels commented Dec 8, 2022

jeffyjefflabs commented Dec 9, 2022

maartenbreddels commented Dec 9, 2022

maartenbreddels commented Dec 9, 2022

Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) #777

Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) #777

Conversation

maartenbreddels commented Sep 14, 2022

maartenbreddels commented Sep 14, 2022

blink1073 commented Sep 15, 2022

maartenbreddels commented Sep 26, 2022

blink1073 commented Sep 26, 2022

maartenbreddels commented Sep 26, 2022 • edited

Results

Main a35dd0b

before the last commit 166cef4

ae8c6cd

maartenbreddels commented Sep 26, 2022

blink1073 commented Sep 26, 2022

maartenbreddels commented Sep 28, 2022

Performance benchmark:

Before

After

before

after

maartenbreddels commented Sep 28, 2022

maartenbreddels commented Sep 28, 2022

blink1073 commented Sep 28, 2022

naterush commented Oct 20, 2022

Overhead from the Bunch class

Overhead comes from the get_attr function in bunch

Using the bunch.attr syntax adds significant overhead to event processing

naterush commented Oct 20, 2022

rmorshea commented Oct 20, 2022

maartenbreddels commented Dec 7, 2022

maartenbreddels commented Dec 8, 2022

blink1073 commented Dec 8, 2022

blink1073 commented Dec 8, 2022

blink1073 left a comment

Choose a reason for hiding this comment

maartenbreddels commented Dec 8, 2022

jeffyjefflabs commented Dec 9, 2022

maartenbreddels commented Dec 9, 2022

maartenbreddels commented Dec 9, 2022

maartenbreddels commented Sep 26, 2022 •

edited

Main `a35dd0b`

before the last commit `166cef4`

`ae8c6cd`

Using the `bunch.attr` syntax adds significant overhead to event processing