December 2018 NumPy Developer Weekend

January 10, 2019

On Friday November 30 and Saturday December 1st, we held a NumPy developer meeting at BIDS. In attendance was the BIDS NumPy team (Stéfan van der Walt, Tyler Reddy, and Matti Picus), core developers Charles Harris, Eric Wieser, and Stefan Hoyer, as well as dask founder Matthew Rocklin. Here follows an outline of the discussion, which is also documented in the meeting notes.

Overview of NumPy-like libraries

We opened with a review of NumPy’s position in relation to new and developing low-level NumPy-like libraries and facilities, including:

  • Numba — which provides a JIT decorator to compile python on the first run,
  • Dask—which provides a distributed NumPy array,
  • XArray—which provides labelled data,
  • CuPy—which provides an ndarray on the GPU,
  • xtensor, xnd and libndtypes (Stefan Krah & Travis Oliphant, QuanSight)—which provide C++-based ndarray objects that can be used across languages.
  • PyData/Sparse (Hameer Abbasi, Quansight)

The newly added __array_function__ protocol (see below) will allow better interoperability with Dask and CuPy. How to leverage lower level libraries such as xtensor, whose power comes from compiling new types, is still unclear.

Ufuncs

Ufuncs are the low-level iterating functions that make NumPy fast. All the basic linear algebra, triginometric, and mathematical operations between NumPy ndarrays is expressed in terms of ufuncs, or higher order operations built upon them. Matmul (the @ operator added to Python) is a late addition to the family and until now has not been a ufunc, which meant the libraries mentioned above could not override it via the __array_ufunc__ protocol.

Over the weekend, matmul was turned into a fully fledged ufunc. Some additional work is still required to make it easier to pre-allocate temporary memory for use in inner loops. Other functions that still need to be converted to ufuncs include argmax and sort.

Dtypes

One central focus of the meeting was the refactoring of NumPy data-types. The current system suffers from a number of deficiencies, the most glaring of which is the inability to easily create user-defined dtypes from Python. Travis Oliphant from QuanSight joined the discussion via video-call. We now have a much clearer concept of how this work should be structured, and will summarize it in the form of a NumPy Enhancement Proposal (NEP).

__array_function__ progress

The development of NEP 18 is underway, slated for experimental inclusion in the upcoming 1.16 release. We examined progress made, and refined the C implementation so that the added protocol checks now have minimal performance impact.

Conclusion

At BIDS, we get to work on NumPy during working hours. However, we appreciate that it is a community driven project, made possible only by the dedication of those who choose to give their time freely, often over weekends and evening. We are grateful for their participation and support, as we work towards improving the NumPy ecosystem, together, for the benefit of everyone.

The NumPy team at BIDS is funded by the Gordon and Betty Moore Foundation through Grant GBMF5447 and by the Alfred P. Sloan Foundation through Grant 2017-9960 to the University of California, Berkeley.



Featured Fellows

Stéfan van der Walt

Research Data Scientist