Postdoctoral Research Associate at Princeton University, Peter Fackeldey, introduced the solution that finally brings effective and fast calculations to the irregular world of high-energy physics: Awkward Array. Presenting his work "Awkward Array: manipulating nested, variable-sized data with NumPy-like idioms" at BIDS on June 6, 2025, Fackeldey brought a powerful idea that can solve a universal challenge in data science related to ragged data.

With a scientific background in high-energy physics (HEP) and programming languages like Python, Fackeldey has always been interested in solving problems with open source software, especially ones related to HEP. Fackeldey highlighted how far the field has come, referencing the historical fact that analysis was once performed manually using 2D images in the 1960s, e.g., at Berkeley. Nowadays, these collision events/images can’t be analyzed manually anymore given the high data rate at modern HEP experiments, e.g., 40 million collisions per second at the Large Hadron Collider at CERN; efficient software solutions are needed. Each of these collisions produces a different number of resulting particles, creating highly jagged data structures. These can be handled more efficiently compared to traditional arrays with NumPy that is optimized for rectangular data. Fackeldey remarked that “Every event produces a certain number of particles… so we have to deal with jagged data.” Ragged or “messy” data like this was simply not easy to work with efficiently in the scientific python ecosystem.
The solution he presented was Awkward Array, a high-performance array library that enables familiar array operations and fast, vectorized computation on irregular or “awkward” data. Work on this project began in 2018 by Jim Pivarski, while he was at Princeton, and Peter Fackeldey has been a primary maintainer for 1.5 years, ensuring the open source code stays functional and secure, and that new features remain consistent with its original purpose.
Fackeldey continued his presentation with key examples to illustrate the work, including grouping multiple jagged arrays into structs of arrays that have a physical meaning, and then provided a closer look at many other powerful ways Awkward Array can be utilized. To simplify working with highly dimensional arrays, Fackeldey described the benefits of the “named axes” feature, which allows scientists to develop more readable code and gives them a tool for reducing errors by giving data dimensions meaningful labels. Awkward Array benefits greatly from integrating with the rest of the scientific python ecosystem, which is a key focus of the project. For example, by connecting with Dask, physicists can scale seamlessly seeminglessly scale their analyses to terabytes of jagged data.

This slide illustrates how Awkward Array scales to large (high energy physics) analysis needs, including its ability to parallelize each analysis step on a cluster with dask using the (“Map Reduce” paradigm).

To hear about all the powerful uses that have resulted from Fackeldey’s work on Awkward Array, watch the video below and read through his slides.
The presentation concluded with a Q&A session that showcased Awkward Array’s potential beyond high-energy physics. Responding to an online inquiry, Fackeldey highlighted the library's value for any ragged data structures, such as those found in text analysis and astronomy. Another audience member pointed out that electric vehicle fleets could also be another use case. These diverse examples showcase Awkward Array as an adaptable tool capable of solving data challenges across a wide range of disciplines.

Kirstie Whitaker, BIDS Executive Director, moderates the Q&A portion of the seminar with Peter Fackeldey
To stay in touch with BIDS and join the conversation, please visit or follow us on Bluesky and LinkedIn, and subscribe to the BIDS newsletter.