Sergey Levine

UC Berkeley, Electrical Engineering and Computer Sciences
Post-Doctoral Researcher
Office & Mailing Address:
750 Sutardja Dai Hall
Berkeley CA 94720
Curriculum Vitae:

I am a post-doctoral researcher in the Department of Electrical Engineering and Computer Sciences at UC Berkeley, working with Pieter Abbeel. I previously completed my Ph.D. at the Department of Computer Science at Stanford University, where I was advised by Vladlen Koltun. In my research, I focus on the intersection between optimal control and machine learning, with the aim of developing algorithms and techniques that can endow machines with the ability to autonomously acquire the skills for executing complex tasks. In particular, I am interested in how learning can be used to acquire behavioral skills for robots and virtual characters, in order to enable greater autonomy, intelligence, and visual realism.


Sergey Levine, Vladlen Koltun. Learning Complex Neural Network Policies with Trajectory Optimization. ICML 2014. [PDF][Website]

In this work, we present an algorithm for learning complex policies represented by neural networks, by means of a novel constrained trajectory optimization method. The algorithm iteratively optimizes trajectory distributions that minimize a cost function and agree with the current neural network policy, which is then trained to reproduce those trajectories. Demonstrations include bipedal walking on rough terrain and recovery from very strong pushes.

Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, Zoran Popović. Offline Policy Evaluation Across Representations with Applications to Educational Games. AAMAS 2014. [PDF][Website]

This paper uses an importance sampling-based policy evaluation method to compare a number of different policy representations for a concept ordering task in an educational game. Using data from real human players, a representation is designed and a corresponding policy is learned that achieves a significant improvement in player retention.

Sergey Levine, Vladlen Koltun. Variational Policy Search via Trajectory Optimization. NIPS 2013. [PDF]

This paper combines policy search with trajectory optimization in a variational framework. The algorithm alternates between optimizing a trajectory to match the current policy and minimize cost, and optimizing a policy to match the current trajectory. Both optimizations are done using standard methods, resulting in a simple algorithm that can solve difficult locomotion problems that are infeasible with only random exploration.

Sergey Levine, Vladlen Koltun. Guided Policy Search. ICML 2013. [PDF][Website]

This paper introduces a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. Using differential dynamic programming to guide the policy search, this method is able to train general-purpose neural network controllers to execute complex, dynamic behaviors such as running on high-dimensional simulated humanoids.

Sergey Levine, Vladlen Koltun. Continuous Inverse Optimal Control with Locally Optimal Examples. ICML 2012. [PDF][Website]

This paper introduces a new probabilistic inverse optimal control algorithm for learning reward functions in Markov decision processes. The method is suitable for large, continuous domains where even computing a full policy is impractical. By using a local approximation of the reward function, this method can also drop the assumption that the demonstrations are globally optimal, requiring only local optimality. This allows it to learn from examples that are unsuitable for prior methods.

Sergey Levine, Jack M. Wang, Alexis Haraux, Zoran Popović, Vladlen Koltun. Continuous Character Control with Low-Dimensional Embeddings. ACM SIGGRAPH 2012. [PDF][Website]

This work presents a method for animating characters performing user-specified tasks by using a probabilistic motion model, which is trained on a small number of artist-provided animation clips. The method uses a low-dimensional space learned from the example motions to continuously control the character's pose to accomplish the desired task. By controlling the character through a reduced space, our method can discover new transitions, precompute a control policy, and avoid low quality poses.

Sergey Levine, Jovan Popović. Physically Plausible Simulation for Character Animation. SCA 2012. [PDF][Video]

This paper describes a method for generating physically plausible responses for animated characters without requiring their motion to be strictly physical. Given a stream of poses, the method simulates plausible responses to physical disturbances and environmental variations. Since the quasi-physical simulation accounts for the dynamics of the character and surrounding objects without requiring the motion to be physically valid, it is suitable for both realistic and stylized, cartoony motions.

Sergey Levine, Zoran Popović, Vladlen Koltun. Nonlinear Inverse Reinforcement Learning with Gaussian Processes. NIPS 2011. [PDF][Poster][Website]

This paper presents an inverse reinforcement learning algorithm for learning unknown nonlinear reward functions. The algorithm uses Gaussian processes and a probabilistic model of the expert to capture complex behaviors from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

Sergey Levine, Yongjoon Lee, Vladlen Koltun, Zoran Popović. Space-Time Planning with Parameterized Locomotion Controllers. ACM Transactions on Graphics 30 (3). [PDF][Video]

In this article, we present a method for efficiently synthesizing animations for characters traversing complex dynamic environments by sequencing parameterized locomotion controllers using space-time planning. The controllers are created from motion capture data, and the space-time planner determines the optimal sequence of controllers to reach a goal in a dynamic, changing environment.

Sergey Levine, Zoran Popović, Vladlen Koltun. Feature Construction for Inverse Reinforcement Learning. NIPS 2010. [PDF][Poster][Website]

This paper presents an algorithm for learning an unknown reward function for a Markov decision process when good basis features are not available, using example traces from the MDP's optimal policy. The algorithm constructs reward features from a large collection of component features, by building logical conjunctions of those component features that are relevant to the example policy.

Sergey Levine, Philipp Krähenbühl, Sebastian Thrun, Vladlen Koltun. Gesture Controllers. ACM SIGGRAPH 2010. [PDF][Video]

Gesture controllers learn optimal policies to generate smooth, compelling gesture animations from speech and other optional inputs. The accompanying video presents examples of various controllers, including controllers that recognize key words, admit manual manipulation of gesture style, and even animate a character with a non-humanoid morphology.

Sergey Levine, Christian Theobalt, Vladlen Koltun. Real-Time Prosody-Driven Synthesis of Body Language. ACM SIGGRAPH Asia 2009. [PDF][Video]

This paper presents the body language synthesis system described in my undergraduate thesis. The method automatically synthesizes body language animations directly from the participants' speech signals, without the need for additional input. The body language animations are synthesized by selecting segments from motion capture data of real people in conversation in real time.

Other Writing & Research Reports

Sergey Levine. Motor Skill Learning with Local Trajectory Methods. PhD thesis, Stanford University Department of Computer Science, 2014. [PDF]

In my PhD thesis, I discuss algorithms for learning cost functions and control policies for humanoid motor skills using example demonstrations and local trajectory methods. Cost functions are learned with a continuous local inverse optimal control algorithm, while control policies are represented with general-purpose neural networks. Results include running and walking on uneven terrain and bipedal push recovery.

Sergey Levine. Exploring Deep and Recurrent Architectures for Optimal Control. NIPS Workshop on Deep Learning, 2013. [PDF]

This paper describes a set of experiments on using deep and recurrent neural networks to build controllers for walking on rough terrain by using the Guided Policy Search algorithm. The results show that deep and recurrent networks have a modest advantage over shallow architectures in terms of generalization, but also suggest that overfitting and local optima are serious problems. Two different types of overfitting are analyzed, and directions for future work are discussed.

Taesung Park, Sergey Levine. Inverse Optimal Control for Humanoid Locomotion. Robotics Science and Systems Workshop on Inverse Optimal Control & Robotic Learning from Demonstration, 2013. [PDF]

In this paper, we learn a reward function for running from motion capture of a human run. The reward is learned using a local inverse optimal control algorithm. We show that it can be used to synthesize realistic running behaviors from scratch, and furthermore can be used to create new running behaviors under novel conditions, such as sloped terrain and strong lateral perturbations.

Sergey Levine. Modeling Body Language from Speech in Natural Conversation. Master's research report, Stanford University Department of Computer Science, 2009. [PDF][Video]

In this report, I describe a new approach for synthesizing body language from prosody using a set of intermediate motion parameters that can be used to describe stylistic qualities of gesture independent of their form. The quality of synthesized motion parameters is compared to the parameters of the original motions accompanying an utterance to obtain a quantitative measure of the performance of the method.

Sergey Levine. Body Language Animation Synthesis from Prosody. Undergraduate thesis, Stanford University Department of Computer Science, 2009. [PDF][Video]

In my undergraduate thesis, I describe the body language synthesis system. This system generates believable body language animations from live speech input, using only the prosody of the speaker's voice. Since the method is suitable for live speech, it can be used in interactive applications, such as networked virtual worlds.

Other Work

Rendering the Eagle Nebula

[PDF][Video][Class Site]

My project for Pat Hanrahan's CS 348B rendering class, completed in collaboration with Edward Luong, received the Grand Prize in the rendering competition. The image is of the Eagle Nebula. The rendering used volumetric photon mapping and simulated the excitement of gases in the nebula by ultraviolet radiation, and the resulting emission of lower-wavelength light.

Airship Combat

[Details][Windows Executable][Class Site]

My project for Marc Levoy's introductory computer graphics course received the "most creative" award in the final project competition. In this game, players control sail-powered airships armed with cannons. The game simulates the physics of the sails using a simple explicit Euler scheme, and uses hierarchical collision detection to detect hits.

Nali Chronicles

[Installer][Mac Patch][Review]

Nali Chronicles is a mod for Unreal Tournament that I developed with a group of friends around 2001-2005. The game contains a complete single-player campaign, with original artwork, scripting, items, enemies, and so forth. My part of the project consisted of organizing the group, programming, 3d modeling, and some level design. The game is a little outdated by now, but if you have a copy of the original Unreal Tournament, you can download the installer and give it a go.

Research Support

NVIDIA Graduate Fellowship, 2013
National Science Foundation Graduate Research Fellowship, 2010
Stanford School of Engineering Fellowship, 2009
© 2009-2010 Sergey Levine.