Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule to recommend str.join takes a list comp over generator #156

Open
hmvp opened this issue Jun 6, 2019 · 15 comments
Open

Add rule to recommend str.join takes a list comp over generator #156

hmvp opened this issue Jun 6, 2019 · 15 comments

Comments

@hmvp
Copy link

hmvp commented Jun 6, 2019

"".join() accepts generators

>>> ",".join(str(i) for i in range(3))
'0,1,2'
@adamchainz
Copy link
Owner

Indeed it can. I think this will require a bit more code to detect than for the existing rules - we'd have to search for a string object, attribute access, and then function call, rather than just a function name. If you want to look into a PR please feel welcome :)

@hmvp
Copy link
Author

hmvp commented Jun 13, 2019

It would be nice if this checker could detect all functions that accept typing.Iterable

@adamchainz
Copy link
Owner

Since flake8 doesn't do any typing, we're restricted to builtins here.

@graingert
Copy link

Actually the linter should enforce ",".join([str(i) for i in range(3)]) this is faster because join internally converts all generators to lists

@graingert
Copy link

@adamchainz
Copy link
Owner

Yes Tom, thanks!

@dvorapa
Copy link

dvorapa commented Apr 5, 2020

Actually the linter should enforce ",".join([str(i) for i in range(3)]) this is faster because join internally converts all generators to lists

It seems it is not much faster these days (the original measurement in SO is 8 years old)

@adamchainz
Copy link
Owner

Still a tiny bit faster - Python 3.8.2:

In [1]: %timeit ",".join([str(i) for i in range(3)])
1.01 µs ± 37.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]: %timeit ",".join([str(i) for i in range(300)])
55.3 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit ",".join((str(i) for i in range(3)))
1.15 µs ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit ",".join((str(i) for i in range(300)))
61 µs ± 623 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@dvorapa
Copy link

dvorapa commented Apr 5, 2020

You compared list to tuple, but the issue is about generator (see the description)

Try %timeit ",".join(str(i) for i in range(*))

@adamchainz
Copy link
Owner

(str(i) for i in range(3)) is a genertaor. It just has extra brackets.

@adamchainz
Copy link
Owner

Python 3.11.2 still shows this perf difference:

In [1]: %timeit ",".join([str(i) for i in range(3)])
322 ns ± 1.13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [2]:  %timeit ",".join([str(i) for i in range(300)])
19.2 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit ",".join((str(i) for i in range(3)))
407 ns ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit ",".join((str(i) for i in range(300)))
22.6 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

I think we could add another rule.

@adamchainz adamchainz changed the title Add "".join() to checker C407 Add rule to recommend str.join takes a list Mar 19, 2023
@adamchainz adamchainz changed the title Add rule to recommend str.join takes a list Add rule to recommend str.join takes a list comp over generator Mar 19, 2023
@Skylion007
Copy link
Contributor

Since flake8 doesn't do any typing, we're restricted to builtins here.

@hmvp @adamchainz Python >=3.8's stdlib ast module actually supports getting the type of functions (although it's not that well documented): https://stackoverflow.com/a/70143560/2444240 Since I assume 3.7 support will be dropped soon, we can look into adding this, (although I would be fine with restricting it to methods from stdlib).

@adamchainz
Copy link
Owner

Yaeh that doesn't help that much - for example, you can't get the type of imported symbols, or the implied types of variables.

@joaoe
Copy link

joaoe commented Jun 30, 2023

In [2]: %timeit ",".join([str(i) for i in range(300)])
19.2 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit ",".join((str(i) for i in range(300)))
22.6 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

A 10% difference with such tiny numbers is a statistical insignificance. Hardly grounds to make a new rule and nag developers about it. If both approaches take somewhat the same time and memory, then flake8-comprehensions shouldn't be nagging people to prefer one syntax.

Or at least make it into an opinionated warning, in the C9xx range.

@adamchainz
Copy link
Owner

The gap has grown significantly in Python 3.12 thanks to Comprehension inlining:

In [1]: %timeit ",".join([str(i) for i in range(3)])
235 ns ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [2]: %timeit ",".join([str(i) for i in range(300)])
16.8 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit ",".join((str(i) for i in range(3)))
394 ns ± 5.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit ",".join((str(i) for i in range(300)))
20.7 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Now about a 40% difference on the small case and 20% on the larger one.

If both approaches take somewhat the same time and memory, then flake8-comprehensions shouldn't be nagging people to prefer one syntax.

They don't take the same time and memory, that is known. The implementation of join first converts the generator into a list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants