This semester, BIDS Senior Fellow Michael Mahoney will co-organize a program of workshops on the Foundations of Data Science at the Simons Institute. These workshops will bring together researchers to identify and develop a set of core techniques and principles forming the foundational algorithmic, mathematical and statistical aspects of modern Data Science. There will be four week-long workshops, including a "bootcamp" during the week of August 27th. The bootcamp will consist of five days of tutorial presentations that will introduce participants to key themes of the program and introduce researchers to a broad range of methods in the foundations of data.
- Aug. 27-31 – Foundations of Data Science Boot Camp
- Sep. 24-28 – Randomized Numerical Linear Algebra and Applications
- Oct. 29-Nov. 2 – Robust and High-Dimensional Statistics
- Nov. 27-30 – Sublinear Algorithms and Nearest-Neighbor Search
Data arising from experimental, observational, and simulational processes in the natural and social sciences, as well as in industrial applications and other domains, have created enormous opportunities for understanding the world in which we live.
The foundations of Data Science consist of core theoretical techniques applicable to data drawn from many different domains. While the foundations of Data Science lie at the intersection between computer science, statistics and applied mathematics - each of these disciplines developed in response to particular long-standing historical problems. Building a foundation for modern Data Science requires rethinking not only how these three research areas interact with data, implementations and applications, but also how each of the areas interacts with the others. Developing the theoretical foundations of Data Science requires paying appropriate attention to the questions and issues of domain scientists who generate and use the data, and to the computational environments and platforms supporting this work.
The emphasis of the semester-long program will be on such topics as dimensionality reduction, randomized numerical linear algebra, optimization, probability in high dimensions, sparse recovery, statistics, including inference and causality, streaming and sublinear algorithms, as well as a variety of application areas that can benefit from these fields and other techniques for processing massive data sets. Each of these related areas has received attention from a diverse set of research communities, and an important goal will be to explore and strengthen connections between methods and problems in these areas, to discover new perspectives on old problems, and to foster interactions between different research communities that address similar problems from quite different perspectives.
Members of the campus data science community are encouraged to participate, both to learn foundational techniques, and also to inform foundational researchers about challenges that arise in particular practical problems.