EECS Joint Colloquium Distinguished Lecture Series

Wednesday, January 23, 2002
Hewlett Packard Auditorium, 306 Soda Hall
4:00-5:00 p.m.

Dr. Kimmen Sjölander

Assistant Professor
UC Berkeley, Department of Bioengineering

Print Version

Computational modeling of protein superfamily evolution




In this talk, I will present a method used to construct phylogenetic trees, identify subfamilies, and predict critical positions in protein molecules. This method employs agglomerative clustering to create the tree structure, and combines Dirichlet mixture priors and relative entropy to estimate the evolutionary relatedness of sequences and subgroups in the input multiple sequence alignment. Minimum description length principles are then employed to obtain a cut of the tree into subtrees to define the subfamilies. This method, Bayesian Evolutionary Tree Estimation (BETE), has been used at Celera Genomics to annotate the human genome with molecular function. BETE can also be used to predict binding pocket positions, with results shown on the SH2 domain family.

    Dr. Sjölander completed her Ph.D. in 1997 at U.C. Santa Cruz, where she worked with Professor David Haussler on computational tools for problems in molecular biology. While the Santa Cruz group became best known for the application of hidden Markov models to protein modeling, Dr. Sjölander's work at UCSC also included the development of Dirichlet mixture priors, stochastic context-free grammars for RNA structure prediction, methods for protein fold prediction, and phylogenetic tree construction and subfamily classification (BETE). Following completion of her Ph.D., she joined Molecular Applications Group, a bioinformatics startup company founded by Stanford professor Michael Levitt, and continued work on algorithm development for protein superfamily analysis. As Chief Scientist of MAG, she oversaw the development of the Panther technology, a software suite for large-scale protein classification. In 1999, Celera Genomics acquired the MAG Panther technology and personnel, and used this technology to classify genes, as described in the Science issue devoted to the human genome.