siamcse21

About

In the last few decades, much effort has been devoted to the development of first-order methods. These methods enjoy a low per-iteration cost and have optimal complexity, are easy to implement, and have proven to be effective for most machine learning applications. First-order methods, however, have significant limitations: (1) they require fine hyper-parameter tuning, (2) they do not incorporate curvature information, and thus are sensitive to ill-conditioning, and (3) they are often unable to fully exploit the power of distributed computing architectures.

Higher-order methods, such as Newton, quasi-Newton and adaptive gradient descent methods, are extensively used in many scientific and engineering domains. These methods possess several nice features: they exploit local curvature information to mitigate the effects of ill-conditioning, they avoid or diminish the need for hyper-parameter tuning, and they have enough concurrency to take advantage of distributed computing environments. This workshop will attempt to bring machine learning and optimization researchers closer, in order to facilitate a discussion with regards to underlying questions such as the following:

Why are they not omnipresent?
Why are higher-order methods important in machine learning?
What advantages can they offer? What are their limitations and disadvantages?
How should (or could) they be implemented in practice?

Previous versions of this workshop have appeared in NeurIPS'19 and ICML'20.

Speakers

Katya Scheinberg

Cornell University

Speaker

Albert Berahas

UMichigan

Speaker

Michal Derezinski

UC Berkeley

Speaker

Quanquan Gu

UCLA

Speaker

Mert Gurbuzbalaban

Rutgers University

Speaker

Anastasios Kyrillidis

Rice University

Speaker

Aryan Mokhtari

UT Austin

Speaker

Mert Pilanci

Stanford University

Speaker

Dominique Orban

Ecole Polytechnique

Speaker

Zhewei Yao

UC Berkeley

Speaker

Schedule

Session I: (Wednesday March 3, 2021) 2:15 - 3:55pm CDT

Time	Speaker	Title
2:15-2:35	Katya Scheinberg	High probability bound on complexity of adaptive optimization schemes with random and noisy information.
2:35-2:55	Mert Pilanci	Exact Polynomial-Time Convex Formulations for Training Multilayer Neural Networks: The Hidden Convex Optimization Landscape
2:55-3:15	Albert S. Berahas	A Symmetric Blockwise Truncated Optimization Algorithm for Machine Learning
3:15-3:35	Mert Gurbuzbalaban	DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
3:35-3:55	Zhewei Yao	Adahessian: An Adaptive Second Order Optimizer for Machine Learning

Session II: (Wednesday March 3, 2021) 4:15 - 5:55pm CDT

Time	Speaker	Title
4:15-4:35	Quanquan Gu	Stochastic Variance-Reduced Cubic Regularized Newton Methods for Nonconvex Optimization
4:35-4:55	Dominique Orban	Proximal-Gradient-Based Trust-Region Methods for Composite Optimization
4:55-5:15	Anastasios Kyrillidis	Distributed Learning of Deep Neural Networks using Independent Subnet Training
5:15-5:35	Aryan Mokhtari	Exploiting Fast Local Convergence of Second-Order Methods Globally: Adaptive Sample Size Methods
5:35-5:55	Michal Derezinski	Overcoming Inversion Bias in Distributed Newton's Method

About

Speakers

Katya Scheinberg

Albert Berahas

Michal Derezinski

Quanquan Gu

Mert Gurbuzbalaban

Anastasios Kyrillidis

Aryan Mokhtari

Mert Pilanci

Dominique Orban

Zhewei Yao

Schedule

Workshop Organizers

Amir Gholami

Fred Roosta

Michael Mahoney