Convex Optimization and Approximation:
Optimization for Modern Data Analysis
EECS 227C/STAT 260 Lec 2

Instructor: Ben Recht
Time:  TuTh 3:30-5:00 PM
Location: 521 Cory Hall 3 LeConte Hall

Office Hours: T 2:30-3:30, M 3-4.
Location: 726 Sutardja Dai Hall


Description: This course will explore theory and algorithms for nonlinear optimization. We will focus on problems that arise in machine learning and computational statistics, paying close attention to concerns about complexity, scaling, and implementation in these domains. Whenever possible, methods will be linked to particular application examples in data analysis. Topics will include

Required background: The prerequisites are previous coursework in linear algebra, multivariate calculus, probability and statistics. Some degree of mathematical maturity is also required. Coursework or background in optimization theory as covered in EE227BT is highly recommended. Numerical programming will be required for this course, so familiarity with MATLAB, R, numerical python, or an equivalent will be necessary.

Grading: There will be about four homeworks, which require some basic programming. There will be a take-home midterm and no final. A course project will also be required.


Texts:

Recommended references:


Lectutre 1 (1/21): Introduction and math review. notes, additional reading: NW Chap. 2

Lecture 2 (1/23): The Gradient Method. notes, additional reading: NW Chap. 3, Nest. Chap. 1.2.3.

Lecture 3 (1/28): More on the Gradient Method. notes

Lecture 4 (1/30): Quick review of convexity. notes

Lecture 5 (2/4): The gradient method and convex functions. notes, additional reading: Nest. Chap. 2.1.1, 2.1.5.

Lecture 6 (2/6): Lower bounds for first order methods. notes, additional reading: Nest. Chap 2.1.2, 2.1.2, 2.1.4.

Lecture 7 (2/13): Momentum and the Heavy Ball Method. notes

Lecture 8 (2/18): Nesterov's accelerated method. notes, additional reading: Nest. 2.2.

Lecture 9 (2/20): Newton and quasi-Newton methods. notes, additional reading: Nest. 1.2.4, NW Chap 3.3, Chap 6.

Lecture 10 (2/27): quasi-Newton loose ends: weak Wolfe line search, L-BFGS, and Barzilai-Borwein , additional reading: NW Chap. 7.2.

Lecture 11 (3/4): Hardness of non-convex optimization.

Lecture 12 (3/6): The stochastic gradient method.

Lecture 13 (3/11): Analysis of the stochastic gradient method.

Lecture 14 (3/13): Subgradients.

Lecture 15 (3/18): The subgradient method.

Lecture 16 (3/20): The projected gradient and proximal point methods.

Lecture 17 (4/1): Duality.

Lecture 18 (4/8): Dual decomposition and the augmented Lagrangian.

Lecture 19 (4/10): Mirror Descent.

Lecture 20 (4/15): Dual averaging.

Lecture 21 (4/17): The alternating direction method of multipliers.


Homeworks

Problem Set 1. Due in class on February 13.

Problem Set 2. Due in class on March 4.

Problem Set 3. Adult data set: [mat] [csv]. Due in class on March 20.


Miscellaneous Readings:

  • James Burke's notes on bisection line search and the Weak Wolfe Conditions.
  • Paul Tseng's analysis of Nesterov's optimal methods.
  • The original paper on the Barzalai-Borwein method.
  • Proof of the hardness of checking local minimality.
  • Notes on the Stochastic gradient method
  • Stephen Boyd's slides and notes on subgradients and their properties
  • Stephen Boyd's slides and notes on subgradients methods
  • Notes on the projected gradient and proximal point methods.
  • Monograph on the Alternating Direction Method of Multipliers by Boyd et al. This earlier paper by Eckstein and Bertsekas draws interesting connections between ADMM and the proximal point method.
  • SDP Reading: Vandenberghe and Boyd's classic survey of Semidefinite programming. Burer and Monteiro's non-convex algorithm for SDP.