AMAP: Multiple Alignment by Sequence Annealing
Ariel Shaul Schwartz1, Lior Pachter and Sudeep Juvekar
Motivation: We introduce a novel approach to multiple alignment that is based on an algorithm for rapidly checking whether single matches are consistent with a partial multiple alignment. This leads to a sequence annealing algorithm, which is an incremental method for building multiple sequence alignments one match at a time. Our approach improves significantly on the standard progressive alignment approach to multiple alignment.
Results: The sequence annealing algorithm performs well on benchmark test sets of protein sequences. It is not only sensitive, but also specific, drastically reducing the number of incorrectly aligned residues in comparison to other programs. The method allows for adjustment of the sensitivity/specificity tradeoff and can be used to reliably identify homologous regions among protein sequences.
Availability: An implementation of the sequence annealing algorithm is available at http://bio.math.berkeley.edu/amap/.
Figure 1: A set of four sequences, an alignment poset together with a linear extension and a global multiple alignment. The function from the set of sequence elements to the alignment poset that specifies the multiple alignment is not shown, but is fully specified by the diagram on the right.
Figure 2: Comparison of different alignment programs with AMAP
- A. S. Schwartz, E. Myers, and L. Pachter, "Alignment Metric Accuracy, arXiv: q-bio.QM/0510052, 2006.
- A. S. Schwartz and L. Pachter, "Multiple Alignment by Sequence Annealing," Proc. European Conf. Computation Biology (ECCB), Eilat, Israel, January 2006 (to appear).