-
Notifications
You must be signed in to change notification settings - Fork 149
Fixes issue #159 and greatly speeds up iteration of PMap in Python3 #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a new class `PMapView` to the `pyrsistent._pmap` namespace. This class is intended to be private and essentially mimics the built-in `dict_values` and `dict_items` pseudo-types. Under the hood they just keep a reference to the `PMap` object and use the `iteritems` and `itervalues` methods; this means that creating a view can be run in constant time insteead of the current `O(n)` time. The `keys` method of `PMap` does not use the `PMapView` class because `PSet` is aleady a `PMapView`-type class for `PMap` keys. I couldn't think of a reason not to just use `PSet`: it's the logical type for a set of keys of a persistent map, and it can just be passed a `PMap` to create such a set in constant time. The tests were also updated to reflect the reality that `keys` now returns a `PSet` instead of a `PVector` and that `values` and `items` return `PMapView`s. As a benchmark, the following code: ```python from timeit import timeit def make_setup(mapsize): return ('from pyrsistent import pmap, pvector; ' 'm = pmap({k:k for k in range(%d)})' % (mapsize,)) tests = [('len(m.keys())', 'len(pvector(m.iterkeys()))'), ('len(m.values())', 'len(pvector(m.itervalues()))'), ('len(m.items())', 'len(pvector(m.iteritems()))')] for mapsize in [10, 100, 1000]: print("Map size:", str(mapsize)) setupstr = make_setup(mapsize) for test_pair in tests: print('=' * 60) print('%-40s' % test_pair[0], '%6f us' % timeit(test_pair[0], setup=setupstr)) print('%-40s' % test_pair[1], '%6f us' % timeit(test_pair[1], setup=setupstr)) print('-' * 60) print('') ``` produces the following output on my desktop: ``` Map size: 10 ============================================================ len(m.keys()) 1.330579 us len(pvector(m.iterkeys())) 1.562125 us ------------------------------------------------------------ ============================================================ len(m.values()) 0.653008 us len(pvector(m.itervalues())) 1.584405 us ------------------------------------------------------------ ============================================================ len(m.items()) 0.653144 us len(pvector(m.iteritems())) 1.159099 us ------------------------------------------------------------ Map size: 100 ============================================================ len(m.keys()) 1.350190 us len(pvector(m.iterkeys())) 11.702080 us ------------------------------------------------------------ ============================================================ len(m.values()) 0.650723 us len(pvector(m.itervalues())) 11.739465 us ------------------------------------------------------------ ============================================================ len(m.items()) 0.653189 us len(pvector(m.iteritems())) 8.951215 us ------------------------------------------------------------ Map size: 1000 ============================================================ len(m.keys()) 1.379429 us len(pvector(m.iterkeys())) 114.202673 us ------------------------------------------------------------ ============================================================ len(m.values()) 0.679722 us len(pvector(m.itervalues())) 113.874931 us ------------------------------------------------------------ ============================================================ len(m.items()) 0.680877 us len(pvector(m.iteritems())) 87.645472 us ------------------------------------------------------------ ```
I should have mentioned that the commit also includes a definition for the |
Big fan of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
As a general comment and background why it looks the way it currently does.
keys/iterkeys
, values/itervalues
and items/iteritems
mimic how the corresponding methods behave in Python 2 (which was the dominating Python version when pyrsistent was first written). I have not changed this since, partly due to backwards compatibility concerns.
The code in this PR brings keys/values/items
closer to the semantics of the same methods in the corresponding types in the stdlib of Python 3 which is nice. Most notably it adds len
and equality comparison which was not previously possible with the generators returned by the corresponding iter...
methods.
If we decide to go this way (which I think is nice!) I also think we can ditch the iter...
methods from the public interface of the PMap to fully align the interface with how things look and behave in Python 3.
That would have to be combined with bumping the version number accordingly and highlight the backwards incompability in the Changelog (I'll take care of that once I release a new version though).
What are your thoughts on this?
pyrsistent/_pmap.py
Outdated
@@ -3,6 +3,68 @@ | |||
from pyrsistent._pvector import pvector | |||
from pyrsistent._transformations import transform | |||
|
|||
class PMapView(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- No need to inherit from
object
in Python 3. - I think I'd prefer to split this class in two, one for the values and one for the items. A couple of methods would have to be duplicated in both classes but you'd get rid of a lot of testing on the boolean
values
in a bunch of methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this by having the PMapValues
and PMapItems
types inherit from PMapView
—less code duplication this way.
raise TypeError("PViewMap requires a Mapping object") | ||
object.__setattr__(self, '_map', m) | ||
object.__setattr__(self, '_values', values) | ||
def __len__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general nitpick I'd prefer one empty line between methods in classes to align the style with other classes in the code base.
def __len__(self): | ||
return len(self._map) | ||
def __setattr__(self, k, v): | ||
raise TypeError("%s is immutable" % (type(self),)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general nitpick I'd like to replace all uses of Python 2 like format strings (eg. "%s ..." % ...
) with f-strings or possibly "{foo}".format(foo="bar")
unless there is a special reason to use the Python 2 convention (pyrsistent claims compatibility with 3.7+).
pyrsistent/_pmap.py
Outdated
return self._map.iteritems() | ||
def __contains__(self, arg): | ||
if self._values: | ||
for v in self._map.itervalues(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible to replace this loop with:
return v in self._map.itervalues()
I think?
pyrsistent/_pmap.py
Outdated
if v == arg: return True | ||
return False | ||
else: | ||
try: (k,v) = arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible to replace this with:
return v in self._map.iteritems()
I think?
def __eq__(self, x): | ||
if x is self: return True | ||
if not isinstance(x, PMapView): return False | ||
# For whatever reason, dict_values always seem to return False for == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, never though of that...
# If this method is not defined, then reversed(pmap) will attempt to reverse | ||
# the map using len() and getitem, usually resulting in a mysterious | ||
# KeyError. | ||
def __reversed__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
tests/map_test.py
Outdated
assert set(['a', 'b']) == set(m(a=1, b=2)) | ||
assert ['a', 'b'] == sorted(m(a=1, b=2).keys()) | ||
assert isinstance(m().keys(), PVector) | ||
assert isinstance(m().keys(), pyr.PSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment on the tests:
I don't think we need to test against the type of the returned value anymore but just the properties of it. The reason it was done before was just to highlight that you got a proper PVector
back (much like you'd get a true, materialized, list in Python 2 for the corresponding methods).
assert len(pm) == len(pm.keys()) | ||
assert len(pm) == len(pm.values()) | ||
assert len(pm) == len(pm.items()) | ||
ks = pm.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Perhaps move the declaration of these variables to right berfore the assertion where they are used. It would make it a bit easier to follow without having to jump back and forth between the declaration and the assert.
Eg.
vs = pm.values()
assert all(v in vs for v in vs)
Yeah, I've been using I understand removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Alright, lets keep the iter*-methods around for now. :-) |
This pull request fixes issue #159 and greatly speeds up iteration of
PMap
in Python3—in fact it greatly speeds up creation of the iterator object alone when callingPMap.keys
,PMap.values
, andPMap.items
.Most of the info is in the commit messages, but I've repeated the relevant bits below (with some minor edits) for convenience!
Happy to iterate on this if there are issues I've missed.
Adds a new class
PMapView
to thepyrsistent._pmap
namespace. This class is intended to be private and essentially mimics the built-indict_values
anddict_items
pseudo-types. Under the hood they just keep a reference to thePMap
object and use theiteritems
anditervalues
methods; this means that creating a view can be run in constant time instead of the currentO(n)
time.The
keys
method ofPMap
does not use thePMapView
class becausePSet
is aleady aPMapView
-type class forPMap
keys. I couldn't think of a reason not to just usePSet
: it's the logical type for a set of keys of a persistent map,and it can just be passed a
PMap
to create such a set in constant time.The tests were also updated to reflect the reality that
keys
now returns aPSet
instead of aPVector
and thatvalues
anditems
returnPMapView
s.As a benchmark, the following code:
produces the following output on my desktop:
These tests were run with the version in this pull request, so the first test is always the new approach and the second test is always the previous approach.