Machine Learning and Science Forum: Over-fitting in Modern Supervised Learning: Memorization, Interpolation, and Decomposition of Errors

ML&Sci Forum

May 11, 2020
1:30pm to 2:30pm
Virtual Presentation

Participate remotely using this Zoom link.

Abstract: In modern supervised learning, best practice suggests the use of highly over-parameterized models, ensuring a model is sufficiently expressive to maximally capture the training data. In contrast, classical supervised learning calls for under-parameterized models, as excessive model complexity can result in over-fitting noise in the training data, producing large generalization errors. Here we attempt to reconcile these two phenomena by identifying the causes of over-fitting in both under- and over-parameterized models and provide simple, intuitive explanations for new phenomena such as double-descent curves. First, we calculate the bias-variance decomposition of the generalization error for three different models (ridge regression, linear and non-linear committee models), numerically and analytically using the cavity method. We identify phase transitions associated with over-fitting and show that the variance exhibits a surprising non-monotonic dependence on model complexity. Next, we make the case that over-fitting in these models can be attributed to memorization and interpolation of the training data and provide a concise, unified picture for understanding this phenomena. Finally, we show that in the linear case, this intuition gives rise to simple geometric interpretations of over-fitting.

Full details about this meeting will be posted here:

The Machine Learning and Science Forum (formerly the Berkeley Statistics and Machine Learning Forum) meets biweekly to discuss current applications across a wide variety of research domains in the physical sciences and beyond. Hosted by UC Berkeley Physics Professor and BIDS Senior Fellow Uros Seljak, these active sessions bring together domain scientists, statisticians, and computer scientists who are either developing state-of-the-art methods or are interested in applying these methods in their research. To receive email notifications about upcoming meetings, or to request more information, please contact berkeleymlforum@gmail.comAll interested members of the UC Berkeley and Berkeley Lab communities are welcome and encouraged to attend.


Jason Rocks

Boston University Physics