Skip to content

SciPy 2015 developer meeting

Mike Toews edited this page Jun 16, 2018 · 25 revisions

Date: July 7, 2015 -- the second day of the tutorials, the day before the conference proper starts

Time: We'll get started at 9:30 am and go from there

Place: Room PDR 1 ("Private Dining Room 1"), which is across the courtyard on the hotel side of the AT&T Center. ("You will enter through the Carillon Restaurant and head up the stairs. You are welcome to check with me at the registration desk or with any of the AT&T Center staff for more complete directions. You have the room reserved from 8:00 am to 5:00 pm. (The At&T Center does not serve snacks in that room, but you are welcome to break and pop over to the Tejas room or the first floor of the Conference Center and join the tutorial attendees for a break. Breakfast served until 9:00; morning snacks until 10:30 and afternoon snacks from 2:15-3:45." Many thanks to Jill Cowan and the SciPy organizers for arranging this!)

Remote access: https://ucberkeley.bluejeans.com/3308071044/

Shared notes: https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing

Who: If you are reading this and interested in any of the topics below, then you're invited!

We also did a preparatory call on July 2 at 20:00 UTC. Here's the preparatory call agenda/notes

There will also be two followup events for those at SciPy:

  • NumPy BoF (Thursday, 5:30pm, Room 203): thinking we'll use this to summarize the outcomes from the meeting for the broader community, and get feedback ditto?
  • NumPy sprint

Notes towards agenda (please add to this!):

  • Primary goals:

    • The intangible value of meeting people in person.
    • Writing down a high-level roadmap, with sketches of some of the bigger topics (approachable dtype, ufuncs changes, Cython in numpy core, etc.).
    • Moving forward on formalizing and documenting our governance model
  • Possible main deliverable (?): a TODO list of documents to write (roadmap, change policy, governance policy, ...) together with agreement on their general contents

  • Big picture / organization / project-level goals stuff:

    • How do we make decisions? (about APIs, commit bits, resources, ...) Are we happy with our strategies for managing disagreements? ("governance")

    • How can we attract more resources (volunteers, $$$, ...), and should we try? If we had money, what would we spend it on?

      • Jaime: suggests taking more advantage of GSoC as a source of interns / money
      • Nathaniel: has a call scheduled July 6 with Josh Greenberg to discuss funding possibilites; will report back.
    • Where are we going? Possible 2-5 year roadmaps:

      • Continue on as we have been?
      • Focus on putting down the shades and locking up to make way for another project?
      • numpy 2.0?
      • Alternatives?
        • Nathaniel will sketch out a possible roadmap about decoupling arrays (data container) / dtypes (data type) / ufuncs (data operations), with the general goal of making it possible to swap in new container types, write new dtypes, and have ufuncs more flexibly work with both, with the general goal of letting import numpy remain the main API entry point even as new types of array libraries come into prominence?
        • One option: have a few people present a few possible visions
    • Clarifying our policy on compatibility -- how do we balance the trade-off between progress and compatibility, for Python APIs versus C APIs, etc.

      • Again, maybe useful to have a few views presented? Do we even disagree enough for this to be useful? :-)
      • ABI compatibility plans (cf. #5888)
        • Is there any way to get out of the 100%-locked-in ABI we currently have without causing unacceptable collateral damage? ( Example of something to aspire to: Python itself breaks ABI on every release but it's no big deal. this is a big deal with large costs, hence https://www.python.org/dev/peps/pep-0384/) Should we just break ABI occasionally, ...?
    • Release process -- is it doing what we want? Would something else be easier/more effective/...?

  • Less big picture but still maybe worth talking about:

    • We need to formalize our relationship with NumFOCUS (requires paperwork, organizing a committee, etc.). Would be good to make concrete progress on this while everyone is there and just get it done.
    • Why is our mailing list and archive down every second week and what can we do about it?!
    • using Cython in the NumPy core?
    • Status of the "Microsoft problem"
    • The BLAS problem: which BLAS should we be shipping on OS X and Windows? Do we dare use OpenBLAS yet?
    • Making np.random implementation evolveable (Ralf: decided on, can we remove this item?)
    • Possibilities for making vectorized indexing less confusing (e.g. the idea of an .oindex attribute)
    • Future of the Matrix class. Have an official position on the future of the Matrix class.
    • Ditto for masked Arrays. (Chuck says: Masked arrays are an important part of the scipy stack and we should modernize them and make them more maintainable. I like the idea that masked arrays should be a container class, which probably means that either the current inheritance from ndarray needs to be hidden, or possibly a new implementation needs to be done. This also ties into the numpy_ufunc discussion.)
    • Do we have any plan on what to do with the overlap between numpy.linalg and scipy.linalg? (NB: libflame provides a complete LAPACK implementation with no C, and we can probably track down the main author at the conference if we want.)
    • Same question for the overlap between numpy.fft and scipy.fftpack.
    • Can/should we make numpy less aggressive about creating object arrays? (#5303, #5353)
    • Adding benchmarks (using asv probably, similar to scipy/benchmarks).
    • ...
  • When shall we meet again?

Downstream feedback

@shoyer (pandas, xray) sez: "unfortunately I will miss the meeting, but my wishlist can be summarized in two word: better dtypes! and better hooks for array-like classes. but really, what pandas needs is extensible dtypes"

@jreback (pandas) will be around in the afternoon, and in addition to #5329 has a laundry list:

missing value support for ints

fix datetime display issue (none of this local tz stuff)

support for datetimes with tz

categoricals

Mark Wiebe: isn't getting in until the evening, so will miss the meeting proper, but sez:

From the point of view of numpy's future, my original goals with joining its development were approximately to push it into becoming a bigger tent project, in particular to eventually enable collaboration between the industry I started in, computer graphics, and the scientific python community. This is something I now see as more practically achievable with dynd, though there that won't come in the short term either. For numpy, I think the main mission should be supporting the needs of the existing user-base and interoperating well with dynd, pandas, and all the other projects experimenting in related but different ways.

For making my life easier with dynd, I'm not sure there's a whole lot more than than having the two projects be amicable. One thing that may be of interest is coming up with a mechanism so dynd and numpy dtypes are a bit more interoperable. Right now, constructing a dynd dtype from a numpy dtype works, but the opposite does not.