Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) #777

Merged
merged 21 commits into from Dec 8, 2022

Conversation

maartenbreddels
Copy link
Contributor

This makes creating ipywidgets about 2x faster (ignoring comm construction).

Related to, but superseded #639

@maartenbreddels
Copy link
Contributor Author

The ipywidget tests fail, since it adds _comm_default at runtime, jupyter-widgets/ipywidgets#3590 fixes that and is also a better method anyway. The question is, is this a breaking change? I don't think there is any mention of adding _trait_default at runtime, and it causes a significant runtime cost.

@blink1073
Copy link
Member

I think it is fair to make this change, assuming jupyter-widgets/ipywidgets#3590 gets merged and released and we go green.

@maartenbreddels
Copy link
Contributor Author

I have another commit that gives a good performance win in a situation where a lot of default are 'static' (like a fixed default value for string and ints). This happens a lot for ipywidgets.

Should I push to this branch? or should I wait to get this in first (I could open a new PR and later rebase against main)

@blink1073
Copy link
Member

I'd say push here

@maartenbreddels
Copy link
Contributor Author

maartenbreddels commented Sep 26, 2022

I've been benchmarking this with pytest-benchmark using this script:

from traitlets import HasTraits, Int, Unicode

class SingleTrait(HasTraits):
    foo = Int()


class ManyTrait(HasTraits):
    foo0 = Unicode("foo0").tag(sync=True)
    foo1 = Unicode("foo1").tag(sync=True)
    foo2 = Unicode("foo2").tag(sync=True)
    foo3 = Unicode("foo3").tag(sync=True)
    foo4 = Unicode("foo4").tag(sync=True)
    foo5 = Unicode("foo5").tag(sync=True)
    foo6 = Unicode("foo6").tag(sync=True)
    foo7 = Unicode("foo7").tag(sync=True)
    foo8 = Unicode("foo8").tag(sync=True)
    foo9 = Unicode("foo9").tag(sync=True)



def create(N, cls):
    for i in range(N):
        cls()


def create_and_access(N, cls):
    keys = cls.class_trait_names(sync=True)
    for i in range(N):
        obj = cls()
        for key in keys:
            getattr(obj, key)


def test_single_trait(benchmark):
    benchmark(create, cls=SingleTrait, N=1_000)

def test_many_trait(benchmark):
    benchmark(create, cls=ManyTrait, N=1_000)

def test_single_trait_and_access(benchmark):
    benchmark(create_and_access, cls=SingleTrait, N=1_000)

def test_many_trait_and_access(benchmark):
    benchmark(create_and_access, cls=ManyTrait, N=1_000)

Results

Main a35dd0b

-------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------
Name (time in ms)                    Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                12.4555 (1.0)      12.9838 (1.0)      12.5927 (1.0)      0.1114 (1.62)     12.5714 (1.0)      0.1253 (2.96)         14;2  79.4108 (1.0)          71           1
test_single_trait_and_access     13.0052 (1.04)     13.5203 (1.04)     13.1818 (1.05)     0.1235 (1.80)     13.1553 (1.05)     0.1789 (4.23)         21;0  75.8624 (0.96)         75           1
test_many_trait                  14.0976 (1.13)     14.4672 (1.11)     14.3162 (1.14)     0.0686 (1.0)      14.3159 (1.14)     0.0423 (1.0)         19;16  69.8510 (0.88)         69           1
test_many_trait_and_access       57.5647 (4.62)     59.3905 (4.57)     58.3111 (4.63)     0.5214 (7.60)     58.1785 (4.63)     0.8110 (19.18)         5;0  17.1494 (0.22)         18           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

before the last commit 166cef4

--------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------
Name (time in ms)                    Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                 2.2666 (1.0)       2.5554 (1.0)       2.3552 (1.0)      0.0274 (1.63)      2.3493 (1.0)      0.0152 (1.06)        50;47  424.5879 (1.0)         400           1
test_single_trait_and_access      2.4376 (1.08)      2.6091 (1.02)      2.5242 (1.07)     0.0169 (1.0)       2.5228 (1.07)     0.0151 (1.06)        54;24  396.1647 (0.93)        394           1
test_many_trait                   2.6998 (1.19)      2.9632 (1.16)      2.8028 (1.19)     0.0299 (1.77)      2.7969 (1.19)     0.0143 (1.0)         47;50  356.7877 (0.84)        367           1
test_many_trait_and_access       27.8987 (12.31)    29.0280 (11.36)    28.0583 (11.91)    0.2163 (12.83)    27.9914 (11.91)    0.0954 (6.67)          4;4   35.6400 (0.08)         36           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

ae8c6cd

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.4483 (1.0)      2.7051 (1.0)      2.5550 (1.0)      0.0328 (1.13)     2.5482 (1.0)      0.0225 (1.44)        59;49  391.3851 (1.0)         370           1
test_single_trait_and_access     2.6786 (1.09)     2.9397 (1.09)     2.7569 (1.08)     0.0308 (1.06)     2.7498 (1.08)     0.0218 (1.39)        52;41  362.7242 (0.93)        362           1
test_many_trait                  3.3650 (1.37)     3.7456 (1.38)     3.4842 (1.36)     0.0291 (1.0)      3.4812 (1.37)     0.0156 (1.0)         22;22  287.0108 (0.73)        278           1
test_many_trait_and_access       5.3269 (2.18)     5.9227 (2.19)     5.5025 (2.15)     0.0671 (2.31)     5.4879 (2.15)     0.0230 (1.47)        13;19  181.7341 (0.46)        170           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Looking at the median, we see a >10x improvement for HasTrait with many 'static'+immutable data, which is quite common in ipywidgets (see e.g. the Layout widget https://github.com/jupyter-widgets/ipywidgets/blob/master/python/ipywidgets/ipywidgets/widgets/widget_layout.py )

where the last commit really helps with a factor of ~5x

@maartenbreddels
Copy link
Contributor Author

I noticed that ae8c6cd slowed down the other cases because we now did some unneeded checking of presence in a dict, so I fixed that.

Now ae8c6cd only gives a speedup compared to 166cef4:

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.2957 (1.0)      2.5338 (1.0)      2.3469 (1.0)      0.0437 (1.0)      2.3307 (1.0)      0.0493 (1.0)         83;20  426.0973 (1.0)         403           1
test_single_trait_and_access     2.3051 (1.00)     6.7410 (2.66)     2.4294 (1.04)     0.2315 (5.30)     2.3947 (1.03)     0.1165 (2.36)          7;7  411.6245 (0.97)        425           1
test_many_trait                  2.7801 (1.21)     3.3918 (1.34)     2.8540 (1.22)     0.0860 (1.97)     2.8196 (1.21)     0.0813 (1.65)        40;23  350.3821 (0.82)        348           1
test_many_trait_and_access       4.8385 (2.11)     5.3911 (2.13)     5.0037 (2.13)     0.1150 (2.63)     4.9796 (2.14)     0.1399 (2.84)         53;6  199.8502 (0.47)        187           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@blink1073
Copy link
Member

Nice work!

@maartenbreddels maartenbreddels changed the title Performance: decrease runtime overhead for constructing HasTraits (almost 2x faster) Performance: decrease runtime overhead for constructing HasTraits (up to 10x faster) Sep 27, 2022
@maartenbreddels
Copy link
Contributor Author

Found another performance issue: We are calling Descriptor.instance_init for each HasTraits we create, for each descriptor, while actually many of them don't do anything.
I've refactored it such that only at class construction we collect all instance_init methods that do something. We by default assume that we "do something", by adding instance_init to a list of functions to be called.
However, many of the TypeTraits opt out of this by overriding subclass_init to not add it's own instance_init to the list of functions.
This makes it almost perfectly backwards compatible, except...
When someone inherits from say Int, its instance_init will not be called. We could make it fully compatible by checking if type(self) == Int (and similar for the other traits). Let me know, I'm happy to add this in.

When creating a 1000 HasTraits with 30 traits that do not need an instance_init, we avoid 30000 unneeded function calls.

Performance benchmark:

Before

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.2703 (1.0)      2.5766 (1.0)      2.3246 (1.0)      0.0510 (1.50)     2.3051 (1.0)      0.0656 (2.79)        65;13  430.1731 (1.0)         407           1
test_single_trait_and_access     2.2841 (1.01)     2.7480 (1.07)     2.3585 (1.01)     0.0789 (2.31)     2.3257 (1.01)     0.0752 (3.20)        62;36  423.9969 (0.99)        430           1
test_many_trait                  2.8105 (1.24)     3.0434 (1.18)     2.8452 (1.22)     0.0341 (1.0)      2.8343 (1.23)     0.0235 (1.0)         36;30  351.4728 (0.82)        287           1
test_many_trait_and_access       4.6244 (2.04)     4.9061 (1.90)     4.6736 (2.01)     0.0457 (1.34)     4.6592 (2.02)     0.0415 (1.77)        24;15  213.9672 (0.50)        213           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                2.2464 (1.0)      2.4565 (1.01)     2.2825 (1.00)     0.0252 (1.0)      2.2769 (1.00)     0.0230 (1.32)        58;22  438.1106 (1.00)        422           1
test_many_trait                  2.2525 (1.00)     2.4332 (1.00)     2.2818 (1.0)      0.0257 (1.02)     2.2746 (1.0)      0.0223 (1.28)        55;32  438.2443 (1.0)         436           1
test_single_trait_and_access     2.2594 (1.01)     2.4330 (1.0)      2.2858 (1.00)     0.0254 (1.01)     2.2781 (1.00)     0.0174 (1.0)         42;36  437.4761 (1.00)        427           1
test_many_trait_and_access       4.1723 (1.86)     4.4918 (1.85)     4.2298 (1.85)     0.0608 (2.42)     4.2072 (1.85)     0.0509 (2.92)        33;26  236.4174 (0.54)        230           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This saves another ~10% for HasTraits with 10 traits. I've also tested this in a more extreme case with 100 traits, which gives a 25% performance increase. This shows this approach scales much better.

Looking at creating a 100 trait object in viztracer, we see:

before

image

setup_instance takes most of the time.

after

image

So instead of setup_instance taking most of the time creating an object, we only spend time doing a dict copy with the trait values (the only overhead that we cannot avoid). Now the last bottle neck seems to be the HasTraits constructor (more specifically the context manager)

@maartenbreddels maartenbreddels changed the title Performance: decrease runtime overhead for constructing HasTraits (up to 10x faster) Performance: decrease runtime overhead for constructing HasTraits (up to 14x faster) Sep 28, 2022
@maartenbreddels
Copy link
Contributor Author

After also working on the HasTrait constructor, where we remove a lot of overhead from the hold_trait_notifications context manager.

------------------------------------------------------------------------------------- benchmark: 4 tests ------------------------------------------------------------------------------------
Name (time in ms)                   Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_single_trait                1.3711 (1.0)      1.5112 (1.0)      1.3890 (1.0)      0.0176 (1.03)     1.3829 (1.0)      0.0121 (1.0)         77;69  719.9187 (1.0)         682           1
test_single_trait_and_access     1.3869 (1.01)     1.5423 (1.02)     1.4141 (1.02)     0.0171 (1.0)      1.4105 (1.02)     0.0171 (1.42)       144;30  707.1417 (0.98)        699           1
test_many_trait                  1.4594 (1.06)     1.6393 (1.08)     1.4794 (1.07)     0.0194 (1.13)     1.4736 (1.07)     0.0154 (1.27)        79;50  675.9557 (0.94)        662           1
test_many_trait_and_access       3.2931 (2.40)     3.5277 (2.33)     3.3287 (2.40)     0.0398 (2.32)     3.3156 (2.40)     0.0267 (2.21)        36;35  300.4134 (0.42)        293           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

We see another ~25% performance increase.

@maartenbreddels maartenbreddels changed the title Performance: decrease runtime overhead for constructing HasTraits (up to 14x faster) Performance: decrease runtime overhead for constructing HasTraits (up to 17x faster) Sep 28, 2022
@maartenbreddels maartenbreddels changed the title Performance: decrease runtime overhead for constructing HasTraits (up to 17x faster) Performance: decrease runtime overhead for constructing HasTraits (up to 20x faster) Sep 28, 2022
@maartenbreddels
Copy link
Contributor Author

I think I don't see any overhead at this moment anymore at constructing HasTraits that are obvious and don't break things, I think this is as fast as I can push it.

Possible future optimization might be in removing magic methods, because that would save us in checking with hasattr at runtime. Also, the Bunch construction gives some overhead, but I don't see how we can get rid of that.

@blink1073
Copy link
Member

Ha, I love seeing the PR title update. Thanks again for digging in to this!

@naterush
Copy link
Contributor

Per Maarten's ideas, I further dug into performance here. Some quick observations:

Overhead from the Bunch class

First, it's worth noting that the Bunch class adds some serious overhead to the dict class, if you use the bunch.attribute syntax to access values. With this simple test case:

def test_bunch_named_attribute_access(benchmark):
    def do_named_access():
        b = Bunch(x=5, y=10)
        for _ in range(1_000_000):
            new = b.x

    benchmark(do_named_access)

def test_bunch_dict_access(benchmark):
    def do_key_access():
        b = Bunch(x=5, y=10)
        for _ in range(1_000_000):
            new = b['x']

    benchmark(do_key_access)

This has the following results:

------------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------------
Name (time in ms)                          Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bunch_dict_access                 31.1114 (1.0)       31.3130 (1.0)       31.2269 (1.0)      0.0496 (1.0)       31.2313 (1.0)      0.0681 (1.0)           9;0  32.0236 (1.0)          32           1
test_bunch_named_attribute_access     233.6903 (7.51)     234.9689 (7.50)     234.2604 (7.50)     0.5379 (10.84)    234.3409 (7.50)     0.9007 (13.23)         2;0   4.2688 (0.13)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So, using the bunch.x is about 7.5x slower than using bunch['x'].

Overhead comes from the get_attr function in bunch

This overhead is clearly from the get_attr function, which is overwritten to call the .getitem function on the underlying dict.

You can get some minor performance improvements by setting __getitem__ = dict.get, (you go to 5x slower), but even then there's a lot of performance left on the table with bunch using the named attribute access.

The other problem here is that user code that has event observers likely expects the AttributeErrors to be turned to KeyErrors, as currently happens in the getitem function. This performance PR shouldn't change the external interface users mess about with -- but explored below are ways to help minimize the impact of bunch performance on tartlets internally.

Using the bunch.attr syntax adds significant overhead to event processing

The Bunch is primarily used internally within traitlets as the return value for the observe functions when watching change events.

Consider the following example where we're watching many traits:

from typing import Any, Dict
from traitlets import HasTraits, Integer, observe, Unicode, All

class ExampleHasTraits(HasTraits):
    foo: int = Integer(2, help="a number").tag(config=True) # type:ignore

    @observe('num')
    def _num_changed(self, change):
        print("{name} changed from {old} to {new}".format(**change))
    

class ManyTrait(HasTraits):
    foo0 = Unicode("foo0").tag(sync=True)
    foo1 = Unicode("foo1").tag(sync=True)
    foo2 = Unicode("foo2").tag(sync=True)
    foo3 = Unicode("foo3").tag(sync=True)
    foo4 = Unicode("foo4").tag(sync=True)
    foo5 = Unicode("foo5").tag(sync=True)
    foo6 = Unicode("foo6").tag(sync=True)
    foo7 = Unicode("foo7").tag(sync=True)
    foo8 = Unicode("foo8").tag(sync=True)
    foo9 = Unicode("foo9").tag(sync=True)
    foo10 = Unicode("foo10").tag(sync=True)
    foo11 = Unicode("foo11").tag(sync=True)
    foo12 = Unicode("foo12").tag(sync=True)
    foo13 = Unicode("foo13").tag(sync=True)
    foo14 = Unicode("foo14").tag(sync=True)
    foo15 = Unicode("foo15").tag(sync=True)
    foo16 = Unicode("foo16").tag(sync=True)
    foo17 = Unicode("foo17").tag(sync=True)
    foo18 = Unicode("foo18").tag(sync=True)
    foo19 = Unicode("foo19").tag(sync=True)
    foo20 = Unicode("foo20").tag(sync=True)

    @observe(All)
    def _num_changed(self, change):
        var = change['old']


def create_and_set(N, cls):
    keys = cls.class_trait_names(sync=True)
    for i in range(N):
        obj = cls()
        for key in keys:
            setattr(obj, key, "123")

import cProfile

# First, we run with 1k
with cProfile.Profile() as pr:
    create_and_set(N=10_000, cls=ManyTrait)

import cProfile, pstats
from pstats import SortKey
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr).sort_stats(sortby)
ps.print_stats()

If we simply remove any usage of named access on the traitlets within the library, we get the following performance change:

With named access:

          6630113 function calls in 1.272 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   210000    0.452    0.000    0.750    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1509(_notify_observers)
   210000    0.098    0.000    1.157    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:702(set)
   630000    0.083    0.000    0.109    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/utils/bunch.py:13(__getattr__)
   210000    0.079    0.000    0.199    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:731(_validate)
   210000    0.075    0.000    0.859    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1494(_notify_trait)
   630000    0.070    0.000    0.070    0.000 {built-in method builtins.hasattr}
   210000    0.049    0.000    0.076    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:740(_cross_validate)
   850000    0.037    0.000    0.037    0.000 {built-in method builtins.isinstance}
   210000    0.037    0.000    1.193    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:720(__set__)
   840021    0.036    0.000    0.036    0.000 {method 'get' of 'dict' objects}
   210000    0.034    0.000    0.784    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1505(notify_change)
   210000    0.032    0.000    0.032    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1227(__get__)
   210000    0.029    0.000    0.060    0.000 {built-in method builtins.getattr}
   210000    0.027    0.000    1.221    0.000 {built-in method builtins.setattr}
   630000    0.026    0.000    0.026    0.000 {method '__getitem__' of 'dict' objects}
   210000    0.021    0.000    0.029    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:2517(validate)
   420000    0.020    0.000    0.020    0.000 {method 'extend' of 'list' objects}
        1    0.020    0.020    1.272    1.272 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:40(create_and_set)
   210000    0.017    0.000    0.017    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:35(_num_changed)
    10000    0.006    0.000    0.023    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1305(setup_instance)
    10000    0.005    0.000    0.029    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1262(__new__)
    10000    0.003    0.000    0.009    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1618(observe)
    10000    0.003    0.000    0.015    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1277(setup_instance)
    10000    0.003    0.000    0.004    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1556(_add_notifiers)
    10000    0.003    0.000    0.012    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1238(instance_init)
    10000    0.003    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1319(__init__)
    10000    0.002    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:237(parse_notifier_name)
    10001    0.001    0.000    0.001    0.000 {method 'copy' of 'dict' objects}
    10000    0.001    0.000    0.001    0.000 {built-in method __new__ of type object at 0x103b37288}
    10000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1743(class_traits)
        1    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/cProfile.py:117(__exit__)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:263(__init__)
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1734(class_trait_names)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:266(__call__)
       22    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
       21    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}

With no named access:

         5370113 function calls in 0.976 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   210000    0.294    0.000    0.475    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1509(_notify_observers)
   210000    0.094    0.000    0.866    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:702(set)
   210000    0.076    0.000    0.194    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:731(_validate)
   210000    0.071    0.000    0.578    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1494(_notify_trait)
   630000    0.068    0.000    0.068    0.000 {built-in method builtins.hasattr}
   210000    0.049    0.000    0.076    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:740(_cross_validate)
   850000    0.035    0.000    0.035    0.000 {built-in method builtins.isinstance}
   210000    0.034    0.000    0.900    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:720(__set__)
   840021    0.034    0.000    0.034    0.000 {method 'get' of 'dict' objects}
   210000    0.031    0.000    0.507    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1505(notify_change)
   210000    0.030    0.000    0.030    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1227(__get__)
   210000    0.029    0.000    0.059    0.000 {built-in method builtins.getattr}
   210000    0.026    0.000    0.926    0.000 {built-in method builtins.setattr}
   210000    0.021    0.000    0.027    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:2517(validate)
   420000    0.019    0.000    0.019    0.000 {method 'extend' of 'list' objects}
        1    0.019    0.019    0.976    0.976 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:40(create_and_set)
   210000    0.016    0.000    0.016    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/test.py:35(_num_changed)
    10000    0.006    0.000    0.022    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1305(setup_instance)
    10000    0.005    0.000    0.028    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1262(__new__)
    10000    0.003    0.000    0.009    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1618(observe)
    10000    0.003    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1556(_add_notifiers)
    10000    0.003    0.000    0.015    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1277(setup_instance)
    10000    0.003    0.000    0.012    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1238(instance_init)
    10000    0.002    0.000    0.002    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1319(__init__)
    10000    0.002    0.000    0.003    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:237(parse_notifier_name)
    10001    0.001    0.000    0.001    0.000 {method 'copy' of 'dict' objects}
    10000    0.001    0.000    0.001    0.000 {built-in method __new__ of type object at 0x105a1f288}
    10000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1743(class_traits)
        1    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/cProfile.py:117(__exit__)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:263(__init__)
        1    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:1734(class_trait_names)
       21    0.000    0.000    0.000    0.000 /Users/nathanrush/jupyter-widgets-conference/traitlets/traitlets/traitlets.py:266(__call__)
       21    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
       22    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}

If you actually line profile the benchmark observers function, you see that about 25% of the time in this function is just doing bunch accesses:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1509                                               @profile
  1510                                               def _notify_observers(self, event):
  1511                                                   """Notify observers of any event"""
  1512    210000      67280.0      0.3      3.8          if not isinstance(event, Bunch):
  1513                                                       # cast to bunch if given a dict
  1514                                                       event = Bunch(event)
  1515    210000     256428.0      1.2     14.3          name, type = event.name, event.type
  1516                                           
  1517    210000      62783.0      0.3      3.5          callables = []
  1518    210000      72407.0      0.3      4.0          if name in self._trait_notifiers:
  1519                                                       callables.extend(self._trait_notifiers.get(name, {}).get(type, []))
  1520                                                       callables.extend(self._trait_notifiers.get(name, {}).get(All, []))
  1521    210000      71647.0      0.3      4.0          if All in self._trait_notifiers:  # type:ignore[comparison-overlap]
  1522    420000     133094.0      0.3      7.4              callables.extend(
  1523    210000      92931.0      0.4      5.2                  self._trait_notifiers.get(All, {}).get(type, [])  # type:ignore[call-overload]
  1524                                                       )
  1525    420000     130307.0      0.3      7.3              callables.extend(
  1526    210000      91059.0      0.4      5.1                  self._trait_notifiers.get(All, {}).get(All, [])  # type:ignore[call-overload]
  1527                                                       )
  1528                                           
  1529                                                   # Now static ones
  1530    210000      80596.0      0.4      4.5          magic_name = "_%s_changed" % name
  1531    210000     185720.0      0.9     10.4          if event.type == "change" and hasattr(self, magic_name):
  1532                                                       class_value = getattr(self.__class__, magic_name)
  1533                                                       if not isinstance(class_value, ObserveHandler):
  1534                                                           _deprecated_method(
  1535                                                               class_value,
  1536                                                               self.__class__,
  1537                                                               magic_name,
  1538                                                               "use @observe and @unobserve instead.",
  1539                                                           )
  1540                                                           cb = getattr(self, magic_name)
  1541                                                           # Only append the magic method if it was not manually registered
  1542                                                           if cb not in callables:
  1543                                                               callables.append(_callback_wrapper(cb))
  1544                                           
  1545                                                   # Call them all now
  1546                                                   # Traits catches and logs errors here.  I allow them to raise
  1547    420000     131027.0      0.3      7.3          for c in callables:
  1548                                                       # Bound methods have an additional 'self' argument.
  1549                                           
  1550    210000      76219.0      0.4      4.3              if isinstance(c, _CallbackWrapper):
  1551                                                           c = c.__call__
  1552    210000      78564.0      0.4      4.4              elif isinstance(c, EventHandler) and c.name is not None:
  1553    210000     152623.0      0.7      8.5                  c = getattr(self, c.name)
  1554                                           
  1555    210000     107464.0      0.5      6.0              c(event)

If we update to not used named access, line 1515 goes from 14% of runtime to 5% of runtime, and 1531 goes from 10% of runtime to 6.6% of runtime, dropping the runtime of this function by 15% with no functionality changes anywhere to the lib.

Also note that per line profiling also tells us that bunch creation is a pretty minimal process overall, with <3% of runtime being spent here. Optimizing these bunch creations doesn't help anywhere near as much as just using them better.

@naterush
Copy link
Contributor

I'll open a PR today against this PR that makes these changes (about 6 characters being changed total), as for traitlets that have their values set frequently, this gives a 20% performance benefit for effectively no change.

Based on the above logs, there aren't any more super obvious optimization targets when it comes to set events + change observers. Are there any other common workflows with traitlets that would be worth profiling to see if there are easy wins?

@rmorshea
Copy link
Contributor

Not sure if this was already used, but it might be worth running perflint to see if there are any other micro-improvements that can be made (balancing for readability of course).

@maartenbreddels
Copy link
Contributor Author

Good news: after the ipywidgets 8.0.3 release, those tests pass with this branch.
Bad news: apparent CI rot, I think we're picking up the latest mypy which is more strict.
I think that the last thing needs to be fixed in main.

@maartenbreddels
Copy link
Contributor Author

@blink1073 do you understand the jupyter_server issue?

@blink1073
Copy link
Member

Server 2.0.1 has a fix for that, I kicked the build to see if we pick it up.

@blink1073
Copy link
Member

Nope, it must be taking a while to propagate.

Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you!

@blink1073 blink1073 merged commit f07afea into ipython:main Dec 8, 2022
@maartenbreddels
Copy link
Contributor Author

🥳
@naterush do you want to open your PR? To follow up

@jeffyjefflabs
Copy link

We are using sphinx 4.5.0 with myst-nb 0.17.1 and are now getting AttributeError: __provides__ at traitlets.py:1009 after upgrading from traitlets 5.6 to 5.7. It looks like it may be related to this PR based on this comment?

@maartenbreddels
Copy link
Contributor Author

Thanks for letting us know, how can I trigger this error? So I can test it locally

1 similar comment
@maartenbreddels
Copy link
Contributor Author

Thanks for letting us know, how can I trigger this error? So I can test it locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants