-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numba's njit 350x *slower* than regular Python for random list access #9535
Comments
Thanks for the report. I think there may be some issues present with what is being recorded and that is what is leading to the timings reported. For instance... numba_bar(natural_numba_foo(n), indices) is doing something like: tmp_lists = natural_numba_foo(n)
numba_bar(tmp_lists, indices) and this matters to Numba because Numba stores lists in a different format to CPython. The "natural" list (as described) is what Numba stores internally as "reflected list", to go from reflected lists back to Python lists requires a "boxing" step where the data in the list is translated from Numba's format to CPython's format. This process is quite costly and in creating that temporary variable the lists have to be translated from Numba format into Python format and then from Python format back into Numba format again so they can be used in the It may be a more representative test if the lists were allocated "outside" of the timed region, OR, built inside the compiled code. Typed lists don't suffer from this problem so much as their Numba representation is carried into their CPython representation, however, this means access from the interpreter pays the performance price for doing the data format translation. If the above were fixed up, I would anticipate that the list access speed would be much closer and potentially even faster that what can be obtained through the Python interpreter. Also, as a general word of warning... benchmarking functions that do very little work and then return something that has no dependency on the "work" being benchmarked can result in surprising performance results when compiled... the compiler might elect to optimise the work out as it has no impact on the returned value! Hope this helps! |
Thanks for your answer. Unfortunately I still don't know how to solve my problem.
|
Here are more benchmarks proving that
Results for
Results for
from numba import njit
from numba.typed import List
import timeit
def foo(n):
return [(np.zeros(0), np.zeros(0))] * n
@njit
def typed_numba_foo(n):
return List([(np.zeros(0), np.zeros(0))] * n)
def bar(a, b):
for idx in b:
_ = a[idx] # the slow bit
return None
@njit
def natural_foo_bar(a, b):
a = [(np.zeros(0), np.zeros(0))] * n
for idx in b:
_ = a[idx] # the slow bit
return None
@njit
def typed_foo_bar(n, b):
a = List([(np.zeros(0), np.zeros(0))] * n)
for idx in b:
_ = a[idx] # the slow bit
return None
natural_numba_foo = njit(foo)
numba_bar = njit(bar)
def call_foo(method, n):
if method == 0:
return natural_numba_foo(n)
elif method == 1:
return typed_numba_foo(n)
elif method == 2:
return foo(n)
elif method > 2:
return n
def call_bar(method, foo_res_or_n, indices):
if method == 0:
numba_bar(foo_res_or_n, indices)
elif method == 1:
numba_bar(foo_res_or_n, indices)
elif method == 2:
bar(foo_res, indices)
elif method == 3:
natural_foo_bar(foo_res_or_n, indices)
elif method == 4:
typed_foo_bar(foo_res_or_n, indices)
n = 10_000_000
indices = np.random.randint(0, n, n)
repeats = 1
for method, arg in [
# ('Natural njit', 0),
('Typed njit', 1), ('Regular Python', 2), ('Natural combined', 3), ('Typed combined', 4)]:
foo_res = call_foo(arg, n) # compile
call_bar(arg, foo_res, indices) # compile
start = timeit.default_timer()
for i in range(repeats):
foo_res = call_foo(arg, n)
print(method, "-foo", "took", timeit.default_timer() - start, "s")
start = timeit.default_timer()
for i in range(repeats):
call_bar(arg, foo_res, indices)
print(method, "-bar", "took", timeit.default_timer() - start, "s") |
To summarize:
(Replacing the trivial body of |
@soerenwolfers thank you for asking about this. If you take a look at the issue tracker, you'll probably find issues that report that both Numba list implementations(reflected-list and typed-list) may suffer performance inefficiencies depending on how they are used. I'm still in process of working out what is going on in your examples, but wanted to share a quick example to show a simpler use-case. Perhaps there is something to take-away from this. Numba's typed list is useful and can be very quick, but your mileage may vary depending on use-case. I tried the following benchmark using a simple accumulator function over integers and got some very promising results:
|
@soerenwolfers I now updated the example to use random access of the list, like:
In this case the benchmarks are:
|
@soerenwolfers I re-did my benchmark again, such that the access is truely randomized (by handing in the indices )and not sequential and I get:
|
@esc I suspect your example might be different from mine because the list contains "value types" instead of "reference types", but that deoends in implementation details of numba that I'm not familiar with. In any case, I agree that once the list access becomes trivial compared to the work being done, numba will look good again, but I intentionally highlighted something that does cause real problems to me, which is that in the opposite case where the work is trivial and it's all about accessing non-trivial data, numba is slower than pure Python. (By lots in some cases and by a little in other cases, depending on how you use numba and what you use it on, but still) |
Indeed. Numba isn't a silver bullet and the lists are known to have certain performance deficiencies, seem like you hit one of those. 😅 I doubt there will be a "workaround" for your use-case, unfortunately. |
@soerenwolfers I thought about this some more, and the issue you are encountering are probably related to so-called "reference counted" types (e.g. tuple of NumPy arrays), i.e. anything non primitive, like an int or a float. I went and dug out some pointers for you, in case you want to have a crack at improving the typed.List for your use-case: High-level interface: https://github.com/numba/numba/blob/main/numba/typed/typedlist.py#L361-L369 Note that the reflected list is scheduled for deprecation, so it's probably not worth attempting to improve that. I think there are quite a few people who would be happy if the typed list performance for more complex use-cases could be improved! 🙌 Hope this helps! |
This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed. |
Closing because I stopped using numba and it doesn't seem this is prioritized by the dev team. |
@soerenwolfers What are you using instead? |
@soerenwolfers it seems like Numba isn't a good fit for your use-case at present. FWIW: I will continue to debug |
I came to the same conclusion. I'm not using anything else either, I just am losing a bit of time running regular python instead of losing it on bending numba to do what it's apparently not targeted to do. |
ref: #9374 |
Reporting a bug
visible in the release notes
(https://numba.readthedocs.io/en/stable/release-notes-overview.html) --- I am using 0.59.1
i.e. it's possible to run as 'python bug.py'.
Random list access is 350x faster with regular Python than with njit. This can be somewhat alleviated by using numba's
List
type, but first, this probably means usage of Python'slist
should not be allowed inno_python
mode if it makes numba so slow, and second, even withList
numba is 5x slower than regular Python.The MWE below reproduces the numbers claimed above. Note that in my real problem the list entries are not as trivial, so I can't just use arrays without lists in my real problem.
The text was updated successfully, but these errors were encountered: