Most life processes within an organism are governed by networks of interacting proteins. Some of these protein networks control functions within a single cell, while others control processes of a particular organ or of the organism as a whole. "In order to understand such a system you have to see it as modular; it has to have some coherence of structure," says Richard Karp.
One such "module" is the proteins that govern the Krebs cycle, a key step of cellular respiration that involves a series of more than a dozen chemical reactions, each initiated by a particular protein. The Krebs cycle, in turn, is part of a far larger module: the proteins that govern the process by which cells convert sugar and oxygen into carbon dioxide, water, and energy.
Karp, who works on combinatorial algorithms and computational complexity, and Michael Jordan, whose specialties are statistics and artificial intelligence, are working on separate projects that aim to illuminate these protein networks. For Jordan, who also holds an appointment in Berkeley's statistics department, a key tool is statistical inference. "Computation in the biological world is computation under uncertainty," he says.
EECS Professor Michael Jordan. (Photo by Peg Skorpinski) One of Jordan's recent projects—joint with Steven Brenner, a professor of biology at Berkeley, and Barbara Engelhardt, an EECS graduate student—is a methodology for predicting the functions of proteins. "The rate of sequencing is growing much faster than our understanding of functionality," Jordan says. "We often know a protein's sequence and structure but not its function." Researchers typically infer a particular protein's function from those of other proteins whose base sequences are similar. But this method alone often yields incorrect answers, Jordan says. On the other hand, data on how a protein evolved turns out to be a good predictor of function. Proteins that stem from a common ancestor are quite likely to play similar roles.
Jordan and his colleagues built a "phylogenetic tree" for the deaminase family, a protein family for which only a small fraction of the proteins' functions are known. Each branch of the tree was labeled with values indicating the probability over a particular time frame that a protein jumped to a new function. Applying sophisticated statistical inference techniques to this model, the researchers were able to predict protein function with 95% accuracy—far better than what other methods have achieved.
In another project, Jordan—along with Patrick Flaherty, an EECS graduate student, and Adam Arkin, a Berkeley biology professor—devised a model to help biologists find the most effective ways of exploring the intricate network of proteins involved in calcium signaling.
Calcium signaling is the process by which cells release calcium ions, which then act as signals to initiate various physiological processes. The calcium signaling process begins when a growth hormone (a protein) bonds to a receptor (also a protein) on the outside of the cell and triggers a protein cascade within the cell. The process is "a chain with lots of loops and at the end of the day it releases calcium," Jordan says.
Biologists often gain information about the role of a particular protein in a process by removing it and observing the effect. For a process as complex as calcium signaling, however, it's not clear which such experiments will yield the most information. Jordan and his collaborators addressed this problem by building a model—annotated with uncertainty—that determines which perturbations to the network would be most likely to increase overall certainty about its structure.
EECS Professor Richard Karp. (Photo by Peg Skorpinski) Meanwhile, Karp has been drawing on his expertise in combinatorial algorithms to devise techniques for discovering protein modules. In one recent project, he came up with a way of searching for clusters of interacting proteins that are conserved across several species. "You would expect that similar organisms have similar protein interactions," says Karp. "Seeing clusters that are conserved across species yields stronger evidence of their functional significance.
Using statistical and algorithmic techniques, he and then-postdoctoral researcher Roded Sharan, now at Tel Aviv University—together with computer scientists and biologists at the Institute of Genetics in Karlsruhe, Germany and bioengineers at University of California, San Diego—found thousands of protein clusters that were common to three related species: yeast, roundworms, and fruit flies. To find the clusters, the researchers first used data from biologists to assign a statistical likelihood to whether a given protein pair interacts and then used this data to weight the edges of a graph of proteins for each organism. Using a measure of similarity for proteins in different organisms, the researchers then matched proteins across the different species.
Because finding all high-weight structures on this giant composite graph would be computationally infeasible, Karp and his colleagues resorted to a combination of algorithmic and heuristic techniques to find approximately 200 candidate protein modules conserved across all three species. The model predicted thousands of previously unknown protein-protein interactions, 60 of which were tested by the biologists in the group. The model's predictions were found to be about 50 percent accurate. "This is strong predictive evidence given that the probability that random proteins interact is essentially zero," Karp says.