"We started doing flying robots somewhat on a lark in 1996," says Shankar Sastry. "Mobile robots on the ground were getting too passé." People said "that’s pie in the sky, you’ll never fly," he recalls.
Shankar Sastry, professor of EECS and dean of the College of Engineering, trapped under a wired sensor network. (Photo by Peg Skorpinski) Two years later, robots were flying off the roof of Cory Hall, and campus safety police booted the project off campus. Undaunted, the team pressed on. Now, much of the technology behind the increasingly common Unmanned Aerial Vehicles (UAVs), including the U.S. military’s Predator aircraft, has come out of Sastry’s lab. Eventually, he envisions that UAVs will become a part of the civilian world, too. "I think it would be great to have a personal UAV controlled by your cell phone that could fly out and get traffic reports," he says.
Keeping a robot in the air presents a host of challenges not faced by vehicles on the ground. An airborne vehicle needs to be able to understand visual images so that it can, for example, judge the distance to the pitching deck of a ship in the dark. It also needs to keep moving constantly for stability and react and respond to shifting wind speeds and unexpected obstacles. For complex maneuvers, the robot can find itself in a place where the next move is not obvious, and it needs to be taught sequences of actions to achieve its end goal. "In flying, everything is much more challenging," Sastry says. "You have to unify all the elements to make them work."
All the elements came together for former Berkeley graduate student Andrew Ng in early 2002, when a helicopter trained to fly by Ng and Jin Kim, Sastry’s then-postdoc, now a professor at Seoul National University, piloted itself through a series of competition-quality acrobatic maneuvers and pirouettes. The robot’s skill exceeded even that of an expert human pilot. "As it was flying, we were all just standing there watching it," says Ng, now an assistant professor at Stanford University.
When Ng and Kim started work on the helicopter, they already had a training system in place. The general-purpose PEGASUS program (Policy Evaluation-of-Goodness and Simulation Using Scenarios), which had been developed by Ng and his advisor, Michael Jordan, was a reinforcement learning algorithm with the ability to teach controllers—the motion-directing brain of a robot—to deal with environments of arbitrary complexity. The algorithm was designed around a "reward shaping" theorem that Ng had derived earlier with Stuart Russell and then-graduate student Daishi Harada.
Shaping is an old idea from animal training: If you want to train a horse to jump over three fences, you give it a lump of sugar after each fence instead of waiting for it to jump over all three by chance. Sometimes, however, the little extra rewards may guide the horse into doing the wrong thing, such as running around in a circle so that it can jump over the first fence again and again to collect more and more sugar. What Ng, Harada, and Russell proved was that the horse will learn the right behavior if the shaping rewards correspond to the gradient of a conservative potential—that is, if the horse is penalized for going the wrong way as well as rewarded for going the right way. In the case of the helicopter, the shaping rewards guided it through the complex maneuvers and reduced the training time from millions of hours to just 18.
For those 18 hours, the controller was subjected to computer simulations of potential scenarios. When the controller responded well—righting a tipped over helicopter, for example, or lifting and hovering at the appropriate height—PEGASUS rewarded the controller. When the controller spun the simulated helicopter out of control by overreacting to an input, it was penalized. As the training progressed, the controller learned just how much it needed to adjust the helicopter’s controls to obtain rewards and minimize punishments, and it carried those skills into the real world.
—Sara Robinson and Katie Greene