Fixes issue #159 and greatly speeds up iteration of PMap in Python3 #243

noahbenson · 2022-02-10T04:56:24Z

This pull request fixes issue #159 and greatly speeds up iteration of PMap in Python3—in fact it greatly speeds up creation of the iterator object alone when calling PMap.keys, PMap.values, and PMap.items.

Most of the info is in the commit messages, but I've repeated the relevant bits below (with some minor edits) for convenience!

Happy to iterate on this if there are issues I've missed.

Adds a new class PMapView to the pyrsistent._pmap namespace. This class is intended to be private and essentially mimics the built-in dict_values and dict_items pseudo-types. Under the hood they just keep a reference to the PMap object and use the iteritems and itervalues methods; this means that creating a view can be run in constant time instead of the current O(n) time.

The keys method of PMap does not use the PMapView class because PSet is aleady a PMapView-type class for PMap keys. I couldn't think of a reason not to just use PSet: it's the logical type for a set of keys of a persistent map,
and it can just be passed a PMap to create such a set in constant time.

The tests were also updated to reflect the reality that keys now returns a PSet instead of a PVector and that values and items return PMapViews.

As a benchmark, the following code:

from timeit import timeit

def make_setup(mapsize):
    return ('from pyrsistent import pmap, pvector; '
            'm = pmap({k:k for k in range(%d)})' % (mapsize,))

tests = [('len(m.keys())',   'len(pvector(m.iterkeys()))'),
         ('len(m.values())', 'len(pvector(m.itervalues()))'),
         ('len(m.items())',  'len(pvector(m.iteritems()))')]

for mapsize in [10, 100, 1000]:
    print("Map size:", str(mapsize))
    setupstr = make_setup(mapsize)
    for test_pair in tests:
        print('=' * 60)
        print('%-40s' % test_pair[0], '%6f us' % timeit(test_pair[0], setup=setupstr))
        print('%-40s' % test_pair[1], '%6f us' % timeit(test_pair[1], setup=setupstr))
        print('-' * 60)
    print('')

produces the following output on my desktop:

Map size: 10
============================================================
len(m.keys())                            1.330579 us
len(pvector(m.iterkeys()))               1.562125 us
------------------------------------------------------------
============================================================
len(m.values())                          0.653008 us
len(pvector(m.itervalues()))             1.584405 us
------------------------------------------------------------
============================================================
len(m.items())                           0.653144 us
len(pvector(m.iteritems()))              1.159099 us
------------------------------------------------------------

Map size: 100
============================================================
len(m.keys())                            1.350190 us
len(pvector(m.iterkeys()))               11.702080 us
------------------------------------------------------------
============================================================
len(m.values())                          0.650723 us
len(pvector(m.itervalues()))             11.739465 us
------------------------------------------------------------
============================================================
len(m.items())                           0.653189 us
len(pvector(m.iteritems()))              8.951215 us
------------------------------------------------------------

Map size: 1000
============================================================
len(m.keys())                            1.379429 us
len(pvector(m.iterkeys()))               114.202673 us
------------------------------------------------------------
============================================================
len(m.values())                          0.679722 us
len(pvector(m.itervalues()))             113.874931 us
------------------------------------------------------------
============================================================
len(m.items())                           0.680877 us
len(pvector(m.iteritems()))              87.645472 us
------------------------------------------------------------

These tests were run with the version in this pull request, so the first test is always the new approach and the second test is always the previous approach.

Adds a new class `PMapView` to the `pyrsistent._pmap` namespace. This class is intended to be private and essentially mimics the built-in `dict_values` and `dict_items` pseudo-types. Under the hood they just keep a reference to the `PMap` object and use the `iteritems` and `itervalues` methods; this means that creating a view can be run in constant time insteead of the current `O(n)` time. The `keys` method of `PMap` does not use the `PMapView` class because `PSet` is aleady a `PMapView`-type class for `PMap` keys. I couldn't think of a reason not to just use `PSet`: it's the logical type for a set of keys of a persistent map, and it can just be passed a `PMap` to create such a set in constant time. The tests were also updated to reflect the reality that `keys` now returns a `PSet` instead of a `PVector` and that `values` and `items` return `PMapView`s. As a benchmark, the following code: ```python from timeit import timeit def make_setup(mapsize): return ('from pyrsistent import pmap, pvector; ' 'm = pmap({k:k for k in range(%d)})' % (mapsize,)) tests = [('len(m.keys())', 'len(pvector(m.iterkeys()))'), ('len(m.values())', 'len(pvector(m.itervalues()))'), ('len(m.items())', 'len(pvector(m.iteritems()))')] for mapsize in [10, 100, 1000]: print("Map size:", str(mapsize)) setupstr = make_setup(mapsize) for test_pair in tests: print('=' * 60) print('%-40s' % test_pair[0], '%6f us' % timeit(test_pair[0], setup=setupstr)) print('%-40s' % test_pair[1], '%6f us' % timeit(test_pair[1], setup=setupstr)) print('-' * 60) print('') ``` produces the following output on my desktop: ``` Map size: 10 ============================================================ len(m.keys()) 1.330579 us len(pvector(m.iterkeys())) 1.562125 us ------------------------------------------------------------ ============================================================ len(m.values()) 0.653008 us len(pvector(m.itervalues())) 1.584405 us ------------------------------------------------------------ ============================================================ len(m.items()) 0.653144 us len(pvector(m.iteritems())) 1.159099 us ------------------------------------------------------------ Map size: 100 ============================================================ len(m.keys()) 1.350190 us len(pvector(m.iterkeys())) 11.702080 us ------------------------------------------------------------ ============================================================ len(m.values()) 0.650723 us len(pvector(m.itervalues())) 11.739465 us ------------------------------------------------------------ ============================================================ len(m.items()) 0.653189 us len(pvector(m.iteritems())) 8.951215 us ------------------------------------------------------------ Map size: 1000 ============================================================ len(m.keys()) 1.379429 us len(pvector(m.iterkeys())) 114.202673 us ------------------------------------------------------------ ============================================================ len(m.values()) 0.679722 us len(pvector(m.itervalues())) 113.874931 us ------------------------------------------------------------ ============================================================ len(m.items()) 0.680877 us len(pvector(m.iteritems())) 87.645472 us ------------------------------------------------------------ ```

noahbenson · 2022-02-10T05:02:58Z

I should have mentioned that the commit also includes a definition for the __reversed__ method of PMap. The method just throws an error saying that pmaps are not reversible. Currently if you try to reverse one, the error is somewhat arcane. (I realize that reversing a mapping like this is unusual, but you can reverse dict objects in Python, so making a clear error message seemed wise.)

noahbenson · 2022-02-10T05:08:03Z

Big fan of pyrsistent by the way—thanks for maintaining it! I use it all over my libraries pimms and neuropythy, and I maintain a similar library in Julia!

tobgu

Thanks!

As a general comment and background why it looks the way it currently does.
keys/iterkeys, values/itervalues and items/iteritems mimic how the corresponding methods behave in Python 2 (which was the dominating Python version when pyrsistent was first written). I have not changed this since, partly due to backwards compatibility concerns.

The code in this PR brings keys/values/items closer to the semantics of the same methods in the corresponding types in the stdlib of Python 3 which is nice. Most notably it adds len and equality comparison which was not previously possible with the generators returned by the corresponding iter... methods.

If we decide to go this way (which I think is nice!) I also think we can ditch the iter... methods from the public interface of the PMap to fully align the interface with how things look and behave in Python 3.

That would have to be combined with bumping the version number accordingly and highlight the backwards incompability in the Changelog (I'll take care of that once I release a new version though).

What are your thoughts on this?

tobgu · 2022-02-10T21:16:50Z

pyrsistent/_pmap.py

@@ -3,6 +3,68 @@
 from pyrsistent._pvector import pvector
 from pyrsistent._transformations import transform

+class PMapView(object):


No need to inherit from object in Python 3.

I think I'd prefer to split this class in two, one for the values and one for the items. A couple of methods would have to be duplicated in both classes but you'd get rid of a lot of testing on the boolean values in a bunch of methods.

I did this by having the PMapValues and PMapItems types inherit from PMapView—less code duplication this way.

tobgu · 2022-02-10T21:17:47Z

pyrsistent/_pmap.py

+                raise TypeError("PViewMap requires a Mapping object")
+        object.__setattr__(self, '_map', m)
+        object.__setattr__(self, '_values', values)
+    def __len__(self):


As a general nitpick I'd prefer one empty line between methods in classes to align the style with other classes in the code base.

tobgu · 2022-02-10T21:22:13Z

pyrsistent/_pmap.py

+    def __len__(self):
+        return len(self._map)
+    def __setattr__(self, k, v):
+        raise TypeError("%s is immutable" % (type(self),))


As a general nitpick I'd like to replace all uses of Python 2 like format strings (eg. "%s ..." % ...) with f-strings or possibly "{foo}".format(foo="bar") unless there is a special reason to use the Python 2 convention (pyrsistent claims compatibility with 3.7+).

tobgu · 2022-02-10T21:24:47Z

pyrsistent/_pmap.py

+            return self._map.iteritems()
+    def __contains__(self, arg):
+        if self._values:
+            for v in self._map.itervalues():


Should be possible to replace this loop with:
return v in self._map.itervalues()
I think?

tobgu · 2022-02-10T21:25:50Z

pyrsistent/_pmap.py

+                if v == arg: return True
+            return False
+        else:
+            try: (k,v) = arg


Should be possible to replace this with:
return v in self._map.iteritems()
I think?

tobgu · 2022-02-10T21:33:03Z

pyrsistent/_pmap.py

+    def __eq__(self, x):
+        if x is self: return True
+        if not isinstance(x, PMapView): return False
+        # For whatever reason, dict_values always seem to return False for ==


Weird, never though of that...

tobgu · 2022-02-10T21:33:39Z

pyrsistent/_pmap.py

+    # If this method is not defined, then reversed(pmap) will attempt to reverse
+    # the map using len() and getitem, usually resulting in a mysterious
+    # KeyError.
+    def __reversed__(self):


tobgu · 2022-02-10T21:36:10Z

tests/map_test.py

    assert set(['a', 'b']) == set(m(a=1, b=2))
    assert ['a', 'b'] == sorted(m(a=1, b=2).keys())
-    assert isinstance(m().keys(), PVector)
+    assert isinstance(m().keys(), pyr.PSet)


General comment on the tests:
I don't think we need to test against the type of the returned value anymore but just the properties of it. The reason it was done before was just to highlight that you got a proper PVector back (much like you'd get a true, materialized, list in Python 2 for the corresponding methods).

tobgu · 2022-02-10T21:42:35Z

tests/map_test.py

+    assert len(pm) == len(pm.keys())
+    assert len(pm) == len(pm.values())
+    assert len(pm) == len(pm.items())
+    ks = pm.keys()


Nitpick: Perhaps move the declaration of these variables to right berfore the assertion where they are used. It would make it a bit easier to follow without having to jump back and forth between the declaration and the assert.

Eg.

vs = pm.values() assert all(v in vs for v in vs)

noahbenson · 2022-02-11T16:39:29Z

Yeah, I've been using pyrsistent since maybe 2014? Can't quite remember but I'm still used to thinking about the data structures in a fundamentally Python2 way.

I understand removing the itervalues()-like methods in order to align with the direction of Python 3, but I could also see the logic in keeping them around for compatibility. (Could always stick a deprecation warning on them for awhile.) IMHO this is a somewhat aesthetic choice—I would probably eventually do this if it were my library, but I don't have a really good reason why 🤷. My main concern about these methods arose from the fact that I often pass pyrsistent maps into functions that expect dict arguments (and thus use keys() or values() and expect certain speed/behavior). I think it's fine/expected for a dict variant type to have extra methods/features as long as the basic dict features work as expected, though.

tobgu

LGTM, thanks!

tobgu · 2022-02-13T21:53:29Z

Alright, lets keep the iter*-methods around for now. :-)

noahbenson added 2 commits February 9, 2022 19:52

Added additional tests of the PMapView class

4ec6e5a

noahbenson mentioned this pull request Feb 10, 2022

Python 3 dict behaviors for PMap #178

Open

tobgu reviewed Feb 10, 2022

View reviewed changes

responses to code review for pull request tobgu#243

5cebec7

tobgu approved these changes Feb 13, 2022

View reviewed changes

tobgu merged commit dbb518b into tobgu:master Feb 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes issue #159 and greatly speeds up iteration of PMap in Python3 #243

Fixes issue #159 and greatly speeds up iteration of PMap in Python3 #243

noahbenson commented Feb 10, 2022

noahbenson commented Feb 10, 2022

noahbenson commented Feb 10, 2022

tobgu left a comment

tobgu Feb 10, 2022

noahbenson Feb 11, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

tobgu Feb 10, 2022

noahbenson commented Feb 11, 2022

tobgu left a comment

tobgu commented Feb 13, 2022

Fixes issue #159 and greatly speeds up iteration of PMap in Python3 #243

Fixes issue #159 and greatly speeds up iteration of PMap in Python3 #243

Conversation

noahbenson commented Feb 10, 2022

noahbenson commented Feb 10, 2022

noahbenson commented Feb 10, 2022

tobgu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noahbenson commented Feb 11, 2022

tobgu left a comment

Choose a reason for hiding this comment

tobgu commented Feb 13, 2022