Tail and some others can work better if they'll use a length hints (pep 424) #510

code-of-kpp · 2021-03-10T20:06:11Z

I understand, that recipes were ported directly from the docs, but things like tail do not perform great in some situations.

Note, that when deque is constructed it consumes almost everything from the iterator, so creation of tail iterator is not lazy.

A better possible implementation is:

def tail(n, iterable):
    if isinstance(iterable, collections.abc.Sequence):
        return iter(iterable[-n:])
    hint: int = operator.length_hint(iterable)
    if hint != 0 and n >= hint:
        return itertools.islice(iterable, hint - n, hint)
    return iter(collections.deque(iterable, maxlen=n))

This will make things like

>>> tail(100, range(1 << 30))
<range_iterator at 0x...>
>>> next(tail(100, range(1 << 30)))
1073741724

really fast, things like

>>> tail(100, reversed(range(1 << 30))) # lazy
<itertools.islice at 0x...>
>>> next(tail(100, reversed(range(1 << 30))))
99

somewhat faster (but at least lazy).

And hopefully will "automagically" improve in the future for some other cases like enumerate and map.

The text was updated successfully, but these errors were encountered:

code-of-kpp · 2021-03-10T20:27:47Z

Please, note, that this is not a PR, as length_hint is not guaranteed to be accurate (but it looks lite it is in practice), so actual implementation should consider other options of the length estimation, like isinstance(obj, Sized) and/or trusting length hints for types, that are known to have accurate hint.

bbayles · 2021-03-12T01:54:45Z

Thanks for the suggestion. I'm open to the "if it's a Sequence, slice it" optimization - you can make a PR for that? But because the "length hint" isn't necessarily true, I'd rather not rely on it.

code-of-kpp · 2021-03-12T09:53:41Z

How about using abc.Sized to cover sets and dicts, for example and wrapping the call to deque in a yeild to force laziness?

code-of-kpp · 2021-03-12T10:07:01Z

Another approach is to wrap deque call in map and chain or something like that. This way the return type of the function can be a little more transparent, e.g. range_iterator or islice in some cases.

Anyway this can be a breaking change as previously data were effectively copied and original iterator was exhausted before the next item picked from the tail(), forcing StopIteration and exiting loops with original iterator-parameter.

bbayles · 2021-10-18T23:11:27Z

To be clear, the pr-welcome is for the tail function.

I'm open to the "if it's a Sequence, slice it" optimization

nanouasyn · 2022-08-06T14:00:33Z

if isinstance(iterable, collections.abc.Sequence):
    return iter(iterable[-n:])

In this implementation, memory will be immediately allocated for all elements of the tail if we work with lists, strings or tuples. But maybe we're not going to use all the tail elements. For example, if then we plan to use takewhile or something like that. Maybe we should use islice for all Sized iterables?

I'm inclined to do this:

def tail(n, iterable):
    if isinstance(iterable, collections.abc.Sized):
        length = len(iterable)
        yield from islice(iterable, max(0, length - n), length)
    else:
        yield from deque(iterable, maxlen=n)

In this case, we will get the laziest version of tail. What do you think about it?

rhettinger · 2023-02-18T19:11:15Z

because the "length hint" isn't necessarily true, I'd rather not rely on it.

This is a good decision. I see people misimplement iterators with a __length_hint__ that doesn't change as the iterator gets consumed or that fails to update when an underlying iterable is mutated.

I'm open to the "if it's a Sequence, slice it" optimization

This is a great idea and will greatly improve performance for short tails of long sequences (which is a typical use case).

Note, that when deque is constructed it consumes almost everything from the iterator, so creation of tail iterator is not lazy.

It is not clear to me that this is always (or even sometimes) better. The iterator protocol is split into two phases, iterator creation and iterator consumption, a setup phase and a take care of business phase.

If the two steps are done together, creating the iterator and consuming it immediately, then lazy evaluation has no effect on the user experience. Code in the following form will see zero benefit:

for line in tail(file):
      ...

If the two steps are not done together, a setup phase and later execution phase, it is reasonable to want to shift as much work as possible to the setup phase so that the execution phase is maximally responsive for the user. This is the same reason that we precompile regexes in templating engines — we want the pattern search to run as quickly as possible when requested.

Likewise, lazy evaluation precludes timely closing of the underlying resource:

with open(somefile) as source_file:
    it = tail(10, source_source)

...

for line in tail(it):   # This would fail since the file is already closed
      ...

So, this would be a breaking change. The mitigation would involve with_iter and leaving the file open during the interval between the setup and execution phase. Alternatively, a user would have to wrap the tail call with list to force eager evaluation so that they can close the underlying resource.

rhettinger · 2023-02-18T22:03:41Z

One other thought: The Sized ABC only checks for the presence of __len__, so the usual (and faster) way is a try/except with len():

try:
    n = len(iterable)
except TypeError:
    deque version
else:
    islice version

bbayles added the pr-welcome We are open to PRs that fix this issue - leave a note if you're working on it! label Mar 14, 2021

NotWearingPants mentioned this issue Oct 15, 2021

strictly_n should be lazy - return an iterator with a length hint #571

Closed

This was referenced Aug 7, 2022

Optimize memory allocation for tail function #630

Merged

Optimization of islice_extended for iterables with fixed length #635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tail and some others can work better if they'll use a length hints (pep 424) #510

Tail and some others can work better if they'll use a length hints (pep 424) #510

code-of-kpp commented Mar 10, 2021 •

edited

code-of-kpp commented Mar 10, 2021

bbayles commented Mar 12, 2021

code-of-kpp commented Mar 12, 2021

code-of-kpp commented Mar 12, 2021

bbayles commented Oct 18, 2021

nanouasyn commented Aug 6, 2022

rhettinger commented Feb 18, 2023

rhettinger commented Feb 18, 2023

Tail and some others can work better if they'll use a length hints (pep 424) #510

Tail and some others can work better if they'll use a length hints (pep 424) #510

Comments

code-of-kpp commented Mar 10, 2021 • edited

code-of-kpp commented Mar 10, 2021

bbayles commented Mar 12, 2021

code-of-kpp commented Mar 12, 2021

code-of-kpp commented Mar 12, 2021

bbayles commented Oct 18, 2021

nanouasyn commented Aug 6, 2022

rhettinger commented Feb 18, 2023

rhettinger commented Feb 18, 2023

code-of-kpp commented Mar 10, 2021 •

edited