Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny

Eleazar Eskin, Eran Halperin and Richard M. Karp

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-02-1196
August 2002

http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/CSD-02-1196.pdf

Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population.

Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site.

We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny.

Our algorithms have been tested on real data and give biologically meaningful results.


BibTeX citation:

@techreport{Eskin:CSD-02-1196,
    Author = {Eskin, Eleazar and Halperin, Eran and Karp, Richard M.},
    Title = {Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2002},
    Month = {Aug},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/5823.html},
    Number = {UCB/CSD-02-1196},
    Abstract = {Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. <p>Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site. <p>We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. <p>Our algorithms have been tested on real data and give biologically meaningful results.}
}

EndNote citation:

%0 Report
%A Eskin, Eleazar
%A Halperin, Eran
%A Karp, Richard M.
%T Efficient Reconstruction of Haplotype Structure via Perfect Phylogeny
%I EECS Department, University of California, Berkeley
%D 2002
%@ UCB/CSD-02-1196
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/5823.html
%F Eskin:CSD-02-1196