iAdmix: USING POPULATION ALLELE FREQUENCIES FOR COMPUTING INDIVIDUAL ADMIXTURE ESTIMATES:

Inference of ancestry is an important aspect of disease association studies as well as for understanding population history. We have developed a fast and accurate method for estimating the admixture proportions for an individual's ancestry using genotype or sequence data and population allele frequencies from a set of parental/reference populations. The method can work with genotype data or sequence data (aligned sequence reads in a BAM file) derived from low-coverage whole-genome sequencing, exome-sequencing or even targeted sequencing experiments. The method uses the L-BFGS-B code  (a limited memory BFGS algorithm with bound constraints) for optimizing the likelihood function and is extremely fast. The source code for iAdmix is available from the Github repository

The method is described in the paper: "Fast individual ancestry inference from DNA sequence data leveraging allele frequencies from multiple populations". Vikas Bansal and Ondrej Libiger. BMC Bioinformatics, 2015

INPUT: 

1. sorted BAM file for sequence data or simple genotype file (rsid genotype pairs) or PLINK files (.ped and .map) 
2. population allele frequencies for common SNPs (generated using HapMap3 genotypes or other genotype datasets) 

OUTPUT:  admixture coefficients for each reference population for each individual 

Running the program: 

Example for genotype file: python runancestry.py --freq hapmap3.allchroms.shared.matrix --geno HGDP01254.genotypes --out HGDP12054.ancestry
Example for bam file: python runancestry.py --freq hapmap3.allchroms.shared.matrix --bam HGDP01254.sorted.bam --out HGDP12054.ancestry
Example for plink genotype file: python runancestry.py --freq hapmap3.allchroms.shared.matrix --plink HGDP01254.genotypes (ped/map) --out HGDP12054.ancestry

Sample input files and allele frequency files:

hapmap3.8populations.hg18.zip is a file with the allele frequencies for 8 HapMap populations that has been used for most of the analysis in the paper referenced above. It should be unzipped and can be used as input to the iAdmix program. The same file is also available with hg19 coordinates for the SNPs. 

A sample genotype file (sample.genotypes.bz2) is also available. It should be unzipped using bzip.