Variant detection from pooled DNA sequencing data

CRISP is a software program designed to detect SNPs and short indels from high-throughput sequencing of pooled DNA samples. CRISP has been primarily developed to analyze data from "artificial" DNA pools, i.e. pools generated by equi-molar pooling of DNA from multiple individual samples. CRISP leverages sequence data from multiple such pools to detect both rare and common variants. Note that the method is not designed for variant detection from a single pool. CRISP was developed for targeted disease association studies in humans but may work well for other applications.

CRISP has been tested on a number of pooled targeted capture datasets as well as exome sequence data generated using the Illumina sequencing platform. 
CRISP works directly with sorted BAM files and outputs a single VCF file with information about the variants and the genotypes (or allele frequencies) for each pool or sample. CRISP can identify both SNVs and short insertion/deletions (indels).  CRISP is implemented in C and uses the SAMtools API to read bam files. The latest version of CRISP should work well even for diploid genomes (pool size = 2).

A statistical method for the detection of variants from next-generation resequencing of DNA pools Vikas Bansal. Bioinformatics, 2010.


The latest source code (updated 10/2014) can be downloaded and compiled from the Github repository.


A binary for CRISP is available for download. Unpack it using the "tar xvzf CRISP-xx.tar.gz" command. This will create a directory with the CRISP executable. The CRISP executable can be run from the command line. CRISP requires multiple BAM files and an indexed reference fasta file to run, and outputs the variants and genotypes to a VCF file. Documentation on how to run CRISP and the input and ouput files is included in a README file in the tar package.


Latest source code (last updated 10/2014) is available from Github. CRISP can call variants from both pooled and unpooled (diploid) multi-sample datasets. 


Efficient and cost effective population resequencing by pooling and in-solution hybridization. V Bansal, R. Tewhey, EM Leproust, NJ Schork. PLoS One, 2011  [sequence data for download]

 
CRISP has been used to perform a pooled-sequencing based association study: Scott-Van Zeeland AA, Bloss C, Tewhey C, Bansal V, et al “Evidence for the Role of EPHX2 gene variants in Anorexia Nervosa”.   Molecular  Psychiatry, 2013

Please send bug reports,  comments, etc to vibansal AT cs.ucsd.edu 

Updates to CRISP

Dec 27 2013

A new version of CRISP has been uploaded (see downloads). CRISP can call variants from both pooled and unpooled (diploid) multi-sample datasets. The new features of CRISP are:

1. EM algorithm to estimate the pooled genotypes (or allele counts) for each pool and the allele frequency jointly across all pools
2. overlapping paired-end reads are processed correctly and bases that are read from both directions are counted only once
3. CRISP works for diploid individuals as well as pooled data (multiple samples equi-molar pooled into a single pool)

Nov 28 2013

CRISP has been extended to estimate pooled genotypes using an EM algorithm, and can handle overlapping paired-end reads and multi-allelic variant sites. It is under active development and a stable version will be released soon. 

July 18 2012

CRISP can now call variants directly from BAM files.  The program (pre-compiled binaries for linux x86_64) is available for download.

April 24 2012

A new version of CRISP has been implemented in C. This version is significantly faster than the previous python version, more accurate for both SNPs and insertions/deletions and also outputs the variants and the allele counts in the VCF format.

June 2010

The paper describing CRISP is published in Bioinformatics. The original software was implemented in Python.