This event was recorded on September 26, 2018, as part of the Randomized Numerical Linear Algebra and Applicationsworkshop at the Simons Institute for the Theory of Computing.
The next milestone for machine learning is the ability to train on massively large datasets. The de facto method used for training neural networks is stochastic gradient descent, a sequential algorithm with poor convergence properties. One approach to address the challenge of large scale training, is to use large mini-batch sizes which allows parallel training. However, large batch size training often results in poor generalization performance. The exact reasons for this are still not completely understood, and all the methods proposed so far for resolving it (such as scaling learning rate or annealing batch size) are specific to a particular problem and do not generalize.
In the first part of the talk, I will show results analyzing large batch size training through the lens of the Hessian operator. The results rule out some of the common theories regarding large batch size training, such as problems with saddle points. In the second part, I will present our results on a novel Hessian based method in combination with robust optimization, that avoids many of the issues with first order methods such as stochastic gradient descent for large scale training.
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
February 22, 2018 | arxiv.org
Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney
Large batch size training of neural networks with adversarial training and second-order information
October 2, 2018 | arXiv.org
Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney
Amir Gholami was a BIDS/FODA Data Science Fellow in 2018-2019, working as a postdoctoral research fellow in Berkeley AI Research Lab working under supervision of Prof. Kurt Keutzer. He received his PhD in Computational Science and Engineering Mathematics from UT Austin, working with Prof. George Biros on bio-physics based image analysis, a research topic which recieved UT Austin’s best doctoral dissertation award in 2018. Amir has extensive experience in second-order optimization methods, image registration, inverse problems, and large scale parallel computing, developing codes that have been scaled up to 200K cores. He is a Melosh Medal finalist, recipient of best student paper award in SC'17, Gold Medal in the ACM Student Research Competition, as well as best student paper finalist in SC’14. His current research includes large scale training of Neural Networks, stochastic second-order optimization methods, and robust optimization.