In recent years, high-throughput sequencing technologies have transformed our understanding of human genetic variation by enabling the sequencing of individual human genomes as well as population-scale genome sequencing. Short insertions/deletions (indels) represent a functionally important and the second most frequent form of variation in the human genome. Indels have not received as much attention as single nucleotide variants (SNVs) and structural variants. Detection of indels from high-throughput sequence datasets is challenging and computational methods currently available for the detection of indels exhibit significantly lower sensitivity and specificity compared to available methods that can be used to identify single nucleotide variants. Novel computational methods that address the challenge presented by the detection and genotyping of indels are urgently needed. We are developing new methods for the detection of short indels from both individual and population-scale sequence datasets that will utilize information about sequence-context specific and sequencing platform specific indel error rates present in large-scale sequence datasets to generate accurate indel calls and genotypes.

Specific goals of this project include:

1. Design and implement a Bayesian method for detection and genotyping of short indels using multi-sample sequence data (whole-exome or whole-genome)

2. Develop algorithms for detection of large deletions and medium length insertions from whole-exome/whole-genome sequence data using a combination of split-read mapping, gapped alignment and realignment

3. Implement methods for indel calling in individual genomes using cross-site error rate modeling, estimation of context-specific error rates and haplotype information

4. Apply these methods to detect functional indels in exon-targeted sequencing studies, population sequencing studies and somatic indels in tumor-normal sequencing studies

Alastair Kilpatrick and Vikas Bansal. Detection and genotyping of short indels using sequence data from multiple samples. ISMB 2016 poster (also selected for oral presentation).

Bansal V. Accurate genotyping of INDELS from population-scale short read sequence data. Selected Talk at HitSeq 2010, Boston (slides)