SNP
Jump to navigation
Jump to search
Base Calling
- Minimum Read depth
- Based on Phred scores
- 1% error rate
- Alignment (Trade off between accuracy and read depth)
- Recalibration of Pherd scores
- Essential
- Phred score of Q should be = 10 to the power Q by 10 or less. This is done by alignning with the reference with the known SNPs
Homo and Heterozgous SNPs in a diploid
- Homozygous -> If an SNP (different than ref) base is counted across the read depth to be more than 80%
- Hetorozygous -> If an SNP (different than ref) base is counted across the read depth to be less than 80%
- Sequence/alignment Error -> If an SNP based is counted to be less than 10%
- This is true of the depth is minimum of 20x
Accuracy in SNP calling
- Accuracy can be improved from single(Ref vs one sample) to multi samples (Ref vs several samles).
- However false positives (SNP call) would also increase with more number of samples
- Possible accuracy by read depth based SNP calling is 85%
- Possible accuracy by LD (linkage disequilibrium) is >95%
- Possible only when multi samples are used
- Software that uses LD for SNP calling is Beagle, IMPUTE2, QCall, MaCH
Plan for SNP calling
- Assumptions
- Multiple Genotypes instead of Ref vs One
- Right combination (contrasting genotype types for specific type ) vs ref.
- LD based SNP calling
- Cross check the SNPs against all the 18 genotypes vs contrasting types
- Filtering
- Use of LD (Software would estimate this)
- HapMap data (Go-through, how haplotypic frequency will help in filtering the best SNPs )
- Deviations from HWE (Estimate the allelic frequencies when HD is estimated and figure out how to estimate the variant SNPs to filter them off)