Many fields are transforming substantially with the introduction of data-intensive computational methods. Because of these method’s rapid evolution, scientists face substantial challenges to acquire the knowledge and technical skills they require for their research. As a result, there is a tremendous need for training in data-intensive research techniques that crosses the traditional boundaries of fields and institutions. While there is a lot of information available about reproducible research methods and modern scientific computation online, it can be challenging even for the most skilled scientists to put it all together.
To address these challenges in the field of neuroscience (the analysis of brain measurements), the Berkeley’s Institute for Data Science (BIDS), Research IT (RIT), the University of Washington’s eScience Institute, and UCSF’s Department of Neurology recently collaborated to develop a workshop. The idea for this workshop was originally hatched at an annual cross-institutional summit of the Moore-Sloan Data Science Environments, where data scientists meet annually to find new ways to address major challenges in data-intensive research. The course was then collaboratively developed in the following months.
Demand for the workshop among researchers at UCSF was intense, with more than 60 applicants spanning fields like neurology, radiology, and psychiatry vying for the 30 available seats in the workshop.
At the workshop, BIDS fellows Chris Holdgraf and Fatma Deniz, eScience Data Scientist Ariel Rokem, and UCSF postdoctoral fellow Maryana Alegro delivered tutorials on computer vision and machine learning for neuroscience in the Genentech Hall at UCSF. The instructors took the audience through detailed hands-on data analysis pipelines that harness open source software for image processing (http://scikit-image.org/, http://nipy.org/dipy/). They also introduced the participants to machine learning techniques (e.g., deep learning methods using Caffe and Tensorflow as well an introduction to scikit-learn). They used data from many domains in neuroscience, covering the analysis and interpretation of neuroscience imaging data ranging from whole-brain functional magnetic resonance imaging (fMRI) to single-cell microscopy. Other supporters and co-organizers of this event included Lea Grinberg, Dani Ushizima, and Michael Schaffer.
The hands-on tutorials were all designed to operate in a cloud-computing environment, including Jupyter Notebooks running on the XSEDE Jetstream cloud platform, facilitated by our local campus champion, Aaron Culich. All participants received their own session on the XSEDE cluster that they accessed via a unique IP address, allowing them to perform computationally demanding analysis using their own laptops. This made the learning experience very effective and reduced the amount of difficulty related to customizing the programming environment for each student. In addition, these online resources were made available for several days after the event so that students could re-run the code and analyze the data on their own time.
For those who were unable to participate but are interested in learning more, all of the course material is open and freely available at https://github.com/choldgraf/UCSF-Data_Driven_Neuro. The group plans to find additional ways to continue to offer data-intensive training in neuroscience and other fields.