In 2002, Manuel Blum, a Berkeley EECS professor emeritus, addressed an ongoing problem for Yahoo and simultaneously issued a challenge to researchers in artificial intelligence. He created a puzzle—consisting of distorted text against a patterned background—that was easily discernable to most humans yet undecipherable to state-of-the-art text-recognition programs. The challenge for researchers was to write a program as adept as humans at recognizing the distorted text.
EECS Professor Jitendra Malik. (Photo by Peg Skorpinski) The puzzle, which became part of Yahoo's registration process, was created to thwart "spam bots"—rogue computer programs that were posting advertisements in Yahoo chat rooms and signing up for thousands of free email accounts. Such puzzles "are useful for companies like Yahoo, but if they're broken it's even more useful for researchers," Blum, now at Carnegie Mellon University, told the New York Times.
Blum's challenge was met by Jitendra Malik, who, together with his then-student, Greg Mori, created a pattern-matching program that recognizes distorted letters by comparing their spatial interrelationships to those of a standard alphabet. Now the same program is being put to an altogether different use: to combine 3D images of fly embryos, each stained to reveal spatial variances in gene expression, into a composite map.
The project's goal is to unravel the genetic programming code that governs how embryonic cells differentiate and become, say, a wing cell or a neuron. "It's a research question," says Malik. "We don't know yet how exactly to go about answering it."
Although all of a fly's cells contain identical DNA, the cells eventually differentiate in function depending on whether particular genes are "expressed" or switched on. Which genes are expressed, in turn, depends on the precise makeup of the soup of proteins in which the DNA strand is immersed. Certain proteins influence gene expression by binding to a site on a DNA strand ahead of a particular gene. Some of them act as promoters, enhancing the chance that the gene will be transcribed and expressed, while others act as repressors, preventing transcription. "It is generally believed that the rules can be quite complicated," says Charless Fowlkes, a postdoctoral researcher. "For example: transcribe gene A if protein B or C is present but not both." Complicating things further, the proteins that influence gene expression are themselves products of the expression of other genes.
Malik and Fowlkes—along with biologists at Lawrence Berkeley National Laboratory and the University of California, Davis—are trying to determine how the concentrations of these proteins affect transcription rates of particular genes. As a first step, they are collecting data on how transcription rates vary spatially over an embryo and how that pattern changes in time as the embryo develops. The next step will be to do the same for the concentrations of proteins that influence transcription. "If we can measure the concentrations of different proteins (the inputs) and the concentration of messenger RNA (the output), then hopefully we can infer the 'control logic' for a given gene," Fowlkes says.
The biologists gauge a gene's transcription rate by staining the cells with fluorescent dyes that show the concentration of that gene's products—particular types of messenger RNA. In a given embryo, the biologists can stain for only two or three types of gene products at once. To understand the fly's gene expression code, however, the researchers need to know the concentrations of thousands of gene products simultaneously.
A way around this problem is to do the staining experiment in parallel, using many embryos, each stained for a small subset of genes, and then combine the data into a composite map stained for thousands of genes. Fruit fly embryos are not identical, however; they differ significantly in shape and number of cells, and regions do not precisely correspond. This is where pattern matching comes in: Malik and Fowlkes are using their pattern-matching program to morph the individual embryos into a single, composite embryo.
The program—written by Mori, now on the faculty at Simon Fraser University—uses standard techniques to find some points along the boundary of a letter or, in this case, a region of an embryo. Then, the program invokes a technique—developed by Malik and another former student, Serge Belongie, now at the University of California, San Diego—that creates, for each boundary point, a radial chart that groups the other boundary points by their angle and distance from it. The charts for each boundary point of a particular embryo region are then compared with the charts of boundary points for the regions of a master embryo. The region with the most similar distribution is likely to be a match.
So far, Malik and Fowlkes have applied the pattern-matching program to produce composite maps showing the concentration patterns for 20 gene products. Over the next year, they plan to expand this map to include patterns of both messenger RNA and proteins produced by 37 genes that regulate early development, and messenger RNA produced by hundreds of genes whose expression is controlled by those proteins.