SNIP-Seq  is a program that utilizes short-read Illumina sequence data from a population of samples to detect SNPs and assign genotypes.  Many methods for multi-sample SNP calling have been proposed that calculate genotype likelihoods by using the base quality values from all samples. However, these methods do not use information about sequencing error rates present in the data itself. The key idea underlying the SNIP-Seq method is use to use information from base-quality values and also sequence reads from multiple individuals to iteratively estimate the genotypes and sequencing error rates.

The program, implemented in Python, has been evaluated on several population datasets generated by targeted sequencing of several hundred kilobases of the human genome in 50 to 300 samples. For more information about SNIP-Seq, please see the following publication:

Accurate detection and genotyping of SNPs utilizing population sequencing data. Bansal V, Harismendy O, Tewhey R, et. al. Genome Res. 20(4):537-45. 2010 April. PMID: 20150320.

Note: SNIP-Seq is no longer updated. It is recommended to use the program CRISP for variant calling from population-scale sequence data.


Other methods for multi-sample variant calling (in no particular order):

1. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Heng Li Bioinformatics 2011
2. A framework for variation discovery and genotyping using next-generation DNA sequencing data. De Pristo et al., Nat Genet. 2011 May;43(5):491-8.
3. SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies. ER Martin et al., Bioinformatics 2010 (one of the few methods that uses sequence reads to estimate error rates)
4. SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. R Nielsen et al., PloS ONE 2012
5. Genotype and SNP calling from next-generation sequencing data. Nielsen R et al. Nature Rev. Genetics 2011. (excellent review on SNV calling methods for sequence data and statistical issues)