Accelerated first-order optimization, in continuous and discrete time
Many optimization algorithms can be viewed as a discretization of continuous-time dynamics. This connection with ODEs often simplifies the design and analysis of optimization algorithms. I work in particular on accelerated methods, and how they can be motivated and studied in continuous-time.
By combining the mirror descent ODE and Nesterov's ODE, it is possible to obtain a general ODE for accelerated constrained optimization. It is motivated using a simple Lyapunov argument.
where \(\nabla \psi^*\) is a Bregman projection that maps \(E^*\) to the feasible set \(\mathcal X\), and \(w(\tau)\) is an increasing weight function. The ODE couples two vriables: a dual variable \(Z\) which accumulates gradients, and a primal variable \(X\) obtained by averaging a Bregman projection of the dual trajectory.
A comparison of accelerated and non-accelerated mirror descent:
Accelerated methods converge faster but often exhibit oscillation of the trajectory around the minimizer. This is illustrated in the video above. Some heuristics have been proposed to alleviate the oscillation. They work by restarting the algorithm (i.e. resetting time to 0 and reinitializing the ODE at the current point) when a certain condition is met. Two restartd conditions are illustrated below:
Gradient condition (restart when the trajectory forms an acute angle with the gradient)
Speed condition (restart when the speed decreases)
A related topic is the development of efficient Bregman projection algorithms (since Bregman projections are an essential component of these algorithms).
In particular, I worked on projections on the simplex (a common problem in optimization and online learning).
In this paper, we develop an efficient algorithm for prjections with Csiszàr divergences (exponential convergence using bisection).
In the sepcial case of exponential divergences (a generalization of the KL divergence), we give two algorithms to compute the exact solution:
A deterministic algorithm with \(O(n \log n)\) complexity, and a randomized algorithm with \(O(n)\) expected complexity.
The corresponding code can be found here: http://github.com/walidk/BregmanProjection
Distributed Learning and Control in Cyber-Physical Systems
Many Cyber-Physical Systems (CPS) can be modeled as sequential Decision Problems, where the decisions of different players are coupled.
Each sequential problem can be studied separately, using online learning theory.
I am interested in understanding the coupling between these problems; in particular:
When can we guarantee that the coupled system converges to an equilibrium?
What are the convergence rates?
Is the convergence robust to stochastic perturbations (e.g. measurement noise, bandit feedback instead of full information feedback, etc.)
I work on the design and analysis of distributed learning algorithms to answer these questions.
I combine techniques from game theory, convex optimization, stochastic approximation, and control theory.
This work is part of the FORCES group (Foundations Of Resilient CybEr-physical Systems).
I have also used distributed learning as a behavioral model for human decision makers. Many cyber-physical systems have human decision makers, and if we can observe sequences of their decisions, we can fit models of their behavioral dynamics, to predict the state of the system, and apply model predictive control. The questions of estimation of learning dynamics and optimal control of systems with human decision makers is explored in these papers:
We aim to develop algorithms for online learning on infinite action sets.
Some results exist when the set is convex or has a particular hierarchical description.
We show that one can, in fact, learn (i.e. achieve sublinear regret) on locally convex sets, either using
This project is done in collaboration with Maximilian Balandat.
Implementation of the learning algorithms can be found on Github
The following video shows an example of online learning on a non-convex (but locally convex) set, with p-norm potentials (special cases of Csiszàr divergences).
Video credit: Maximilian Balandat.
Past Research
I worked on the Mobile Millennium project at the California Center for Innovative Transportation (CCIT), the Connected Corridor project at PATH (Partners for Advanced Transportation Technology), and the ORESTE group (Optimal REroute Strategies for Traffic ManagEment) with INRIA.
As part of the Mobile Millennium project, I worked on travel time estimation on urban traffic networks, using graphical models. Mobile Millennium collects probe vehicle GPS locations at regular time intervals, from which one can infer travel time observations. The problem is to infer the current travel times from sparse, possibly noisy observations, and predict the evolution of travel time during the next time steps. Some of the interesting questions include how to do approximate inference on large networks, for which exact inference is computationally intractable.
In collaboration with the ORESTE group at INRIA and PATH at Berkeley, I worked on a dynamical model of traffic using conservation PDEs.
We use the adjoint method for solving PDE-constrained optimal control problems, such as ramp metering and routing. Some of the papers on the topic: