Skip to content

BLAS desiderata

Matti Picus edited this page Aug 23, 2018 · 6 revisions

The numerical ecosystem could really use a modern, optionally-multithreaded BLAS under a BSD-like license with a priority on

  • Correctness
  • Out-of-the-box single-binary functionality (e.g., runtime kernel selection, runtime thread control)
  • Speed
  • Portability

...in roughly that order.

OpenBLAS is currently the library that's closest to providing these things, but there are a number of improvements possible. Fixing these might make some good concrete targets for people to go after:

  • The path leading to getting a generally-useful build is lined with tricky booby-traps (e.g., automagic capping of the maximum number of threads and the famous NO_AFFINITY).
  • There are concerns about lack of tests. That link lists a number of specific bugs that made it past the existing test suite and still are not tested for; in general it would be very useful to build up a set of comprehensive BLAS/Lapack tests that includes tests for realistic problem sizes.
  • It's not possible (?) to override CPU detection at runtime, which makes it hard to run comprehensive tests.
  • The use of AT&T-syntax inline asm (?) prevents the use of MSVC; using intrinsics instead might be more maintainable and certainly more portable. MSVC now supported
  • ...any more?