Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Large Scale Recovery of Haplotypes from Genotype Data using Imperfect Phylogeny

Eran Halperin and Eleazar Eskin

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-02-1195
August 2002

http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/CSD-02-1195.pdf

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize an individual's variation, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated "blocks". The majority of individuals have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (0.47%) when taking into account the predictions for the uncommon haplotypes.


BibTeX citation:

@techreport{Halperin:CSD-02-1195,
    Author = {Halperin, Eran and Eskin, Eleazar},
    Title = {Large Scale Recovery of Haplotypes from Genotype Data using Imperfect Phylogeny},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2002},
    Month = {Aug},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/5821.html},
    Number = {UCB/CSD-02-1195},
    Abstract = {Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation.  Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize an individual's variation, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated "blocks". The majority of individuals have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (0.47%) when taking into account the predictions for the uncommon haplotypes.}
}

EndNote citation:

%0 Report
%A Halperin, Eran
%A Eskin, Eleazar
%T Large Scale Recovery of Haplotypes from Genotype Data using Imperfect Phylogeny
%I EECS Department, University of California, Berkeley
%D 2002
%@ UCB/CSD-02-1195
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2002/5821.html
%F Halperin:CSD-02-1195