Speakers:
Yu Feng, BIDS and the Berkeley Center for Cosmological Physics
Nick Hand, Berkeley Center for Cosmological Physics, University of California, Berkeley
Launching Python Applications on Peta-scale Massively Parallel Systems
Abstract: We introduce a method to launch Python applications at near native speed on large high performance computing systems. The Python run-time and other dependencies are bundled and delivered to computing nodes via a broadcast operation. The interpreter is instructed to use the local version of the files on the computing node, removing the shared file system as a bottleneck during the application start-up. Our method can be added as a preamble to the traditional job script, improving the performance of user applications in a non-invasive way. Furthermore, we find it useful to implement a three-tier system for the supporting components of an application, reducing the overhead of runs during the development phase of an application. The method launches applications on Cray XC30 and Cray XT systems up to full machine capacity with an overhead of typically less than 2 minutes. We expect the method to be portable to similar applications in Julia or R. We also hope the three-tier system for the supporting components provides some insight for the container based solutions for launching applications in a development environment. We provide the full source code of an implementation of the method at https://github.com/rainwoodman/python-mpi-bcast. Now that large scale Python applications can launch extremely efficiently on state-of-the-art super-computing systems, it is time for the high performance computing community to seriously consider building complicated computational applications at large scale with Python.
Event Website: SciPy 2016
Speaker(s)
Yu Feng
My study in cosmology focuses on the formation of galaxies in the large-scale structure of the Universe. Cosmology is a data-driven science. I develop the necessary software and tools that can efficiently handle these data on platforms from laptops to supercomputers, including (1) massively parallel software to solve gravity and hydrodynamics on tens of thousands of computing nodes, (2) tools to visualize density estimation of large particle datasets with hundreds of billions of particles, and (3) algorithms to understand the clustering of galaxies. I also contribute to open source data science software projects as a user developer. I strongly believe in the power of adequate tools in data-driven science research.