Beyond First Order Methods in Machine Learning Minisymposium
SIAM CSE 2021

About

In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications. First-order methods, however, have significant limitations: (1) they require fine hyper-parameter tuning, (2) they do not incorporate curvature information, and thus are sensitive to ill-conditioning, and (3) they are often unable to fully exploit the power of distributed computing architectures.

Higher-order methods, such as Newton, quasi-Newton and adaptive gradient descent methods, are extensively used in many scientific and engineering domains. These methods possess several nice features: they exploit local curvature information to mitigate the effects of ill-conditioning, they avoid or diminish the need for hyper-parameter tuning, and they have enough concurrency to take advantage of distributed computing environments. This workshop will attempt to bring machine learning and optimization researchers closer, in order to facilitate a discussion with regards to underlying questions such as the following:

  • Why are they not omnipresent?
  • Why are higher-order methods important in machine learning?
  • What advantages can they offer? What are their limitations and disadvantages?
  • How should (or could) they be implemented in practice?


Previous versions of this workshop have appeared in NeurIPS'19 and ICML'20.

Speakers

Katya Scheinberg

Cornell University

Speaker

Albert Berahas

UMichigan

Speaker

Michal Derezinski

UC Berkeley

Speaker

Quanquan Gu

UCLA

Speaker

Mert Gurbuzbalaban

Rutgers University

Speaker

Anastasios Kyrillidis

Rice University

Speaker

Aryan Mokhtari

UT Austin

Speaker

Mert Pilanci

Stanford University

Speaker

Dominique Orban

Ecole Polytechnique

Speaker

Zhewei Yao

UC Berkeley

Speaker

Schedule



Session I: (Wednesday March 3, 2021) 2:15 - 3:55pm CDT

Time Speaker Title
2:15-2:35 Katya Scheinberg High probability bound on complexity of adaptive optimization schemes with random and noisy information.
2:35-2:55 Mert Pilanci Exact Polynomial-Time Convex Formulations for Training Multilayer Neural Networks: The Hidden Convex Optimization Landscape
2:55-3:15 Albert S. Berahas A Symmetric Blockwise Truncated Optimization Algorithm for Machine Learning
3:15-3:35 Mert Gurbuzbalaban DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
3:35-3:55 Zhewei Yao Adahessian: An Adaptive Second Order Optimizer for Machine Learning


Session II: (Wednesday March 3, 2021) 4:15 - 5:55pm CDT

Time Speaker Title
4:15-4:35 Quanquan Gu Stochastic Variance-Reduced Cubic Regularized Newton Methods for Nonconvex Optimization
4:35-4:55 Dominique Orban Proximal-Gradient-Based Trust-Region Methods for Composite Optimization
4:55-5:15 Anastasios Kyrillidis Distributed Learning of Deep Neural Networks using Independent Subnet Training
5:15-5:35 Aryan Mokhtari Exploiting Fast Local Convergence of Second-Order Methods Globally: Adaptive Sample Size Methods
5:35-5:55 Michal Derezinski Overcoming Inversion Bias in Distributed Newton's Method

Workshop Organizers

Amir Gholami

UC Berkeley

Fred Roosta

University of Queensland

Michael Mahoney

UC Berkeley