Large Scale Stochastic Training of Neural Networks

Randomized Numerical Linear Algebra and Applications


September 26, 2018
11:00am to 11:30am
UC Berkeley

This event was recorded on September 26, 2018, as part of the Randomized Numerical Linear Algebra and Applicationsworkshop at the Simons Institute for the Theory of Computing.

The next milestone for machine learning is the ability to train on massively large datasets. The de facto method used for training neural networks is stochastic gradient descent, a sequential algorithm with poor convergence properties. One approach to address the challenge of large scale training, is to use large mini-batch sizes which allows parallel training. However, large batch size training often results in poor generalization performance. The exact reasons for this are still not completely understood, and all the methods proposed so far for resolving it (such as scaling learning rate or annealing batch size) are specific to a particular problem and do not generalize.

In the first part of the talk, I will show results analyzing large batch size training through the lens of the Hessian operator. The results rule out some of the common theories regarding large batch size training, such as problems with saddle points. In the second part, I will present our results on a novel Hessian based method in combination with robust optimization, that avoids many of the issues with first order methods such as stochastic gradient descent for large scale training.


Related Articles

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
February 22, 2018  |
Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney

Large batch size training of neural networks with adversarial training and second-order information
October 2, 2018  |
Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney


Amir Gholami

BIDS Alum – Data Science Fellow

Amir Gholami was a BIDS/FODA Data Science Fellow in 2018-2019, working as a postdoctoral research fellow in Berkeley AI Research Lab working under supervision of Prof. Kurt Keutzer. He received his PhD in Computational Science and Engineering Mathematics from UT Austin, working with Prof. George Biros on bio-physics based image analysis, a research topic which recieved UT Austin’s best doctoral dissertation award in 2018. Amir has extensive experience in second-order optimization methods, image registration, inverse problems, and large scale parallel computing, developing codes that have been scaled up to 200K cores. He is a Melosh Medal finalist, recipient of best student paper award in SC'17, Gold Medal in the ACM Student Research Competition, as well as best student paper finalist in SC’14. His current research includes large scale training of Neural Networks, stochastic second-order optimization methods, and robust optimization.