Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny

Eleazar Eskin, Eran Halperin and Richard M. Karp

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-02-1196
August 2002

http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/CSD-02-1196.pdf

Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population.

Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site.

We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny.

Our algorithms have been tested on real data and give biologically meaningful results.


BibTeX citation:

@techreport{Eskin:CSD-02-1196,
    Author = {Eskin, Eleazar and Halperin, Eran and Karp, Richard M.},
    Title = {Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2002},
    Month = {Aug},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/5823.html},
    Number = {UCB/CSD-02-1196},
    Abstract = {Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. <p>Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site. <p>We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. <p>Our algorithms have been tested on real data and give biologically meaningful results.}
}

EndNote citation:

%0 Report
%A Eskin, Eleazar
%A Halperin, Eran
%A Karp, Richard M.
%T Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny
%I EECS Department, University of California, Berkeley
%D 2002
%@ UCB/CSD-02-1196
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/5823.html
%F Eskin:CSD-02-1196