In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications. First-order methods, however, have significant limitations: (1) they require fine hyper-parameter tuning, (2) they do not incorporate curvature information, and thus are sensitive to ill-conditioning, and (3) they are often unable to fully exploit the power of distributed computing architectures.
Higher-order methods, such as Newton, quasi-Newton and adaptive gradient
descent methods, are extensively used in many scientific and engineering
domains. These methods possess several nice features: they exploit local
curvature information to mitigate the effects of ill-conditioning, they avoid
or diminish the need for hyper-parameter tuning, and they have enough
concurrency to take advantage of distributed computing environments.
This workshop will attempt to bring machine learning and optimization researchers closer, in order to facilitate a discussion with regards to underlying questions such as the following:
Previous versions of this workshop have appeared in NeurIPS'19 and ICML'20.
Time | Speaker | Title |
---|---|---|
2:15-2:35 | Katya Scheinberg | High probability bound on complexity of adaptive optimization schemes with random and noisy information. |
2:35-2:55 | Mert Pilanci | Exact Polynomial-Time Convex Formulations for Training Multilayer Neural Networks: The Hidden Convex Optimization Landscape |
2:55-3:15 | Albert S. Berahas | A Symmetric Blockwise Truncated Optimization Algorithm for Machine Learning |
3:15-3:35 | Mert Gurbuzbalaban | DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate |
3:35-3:55 | Zhewei Yao | Adahessian: An Adaptive Second Order Optimizer for Machine Learning |
Time | Speaker | Title |
---|---|---|
4:15-4:35 | Quanquan Gu | Stochastic Variance-Reduced Cubic Regularized Newton Methods for Nonconvex Optimization |
4:35-4:55 | Dominique Orban | Proximal-Gradient-Based Trust-Region Methods for Composite Optimization |
4:55-5:15 | Anastasios Kyrillidis | Distributed Learning of Deep Neural Networks using Independent Subnet Training |
5:15-5:35 | Aryan Mokhtari | Exploiting Fast Local Convergence of Second-Order Methods Globally: Adaptive Sample Size Methods |
5:35-5:55 | Michal Derezinski | Overcoming Inversion Bias in Distributed Newton's Method |
UC Berkeley
University of Queensland
UC Berkeley