SNP

From HORTS 1993
Jump to navigation Jump to search


Base Calling

  • Minimum Read depth
  • Based on Phred scores
  • 1% error rate
  • Alignment (Trade off between accuracy and read depth)
  • Recalibration of Pherd scores
    • Essential
    • Phred score of Q should be = 10 to the power Q by 10 or less. This is done by alignning with the reference with the known SNPs

Homo and Heterozgous SNPs in a diploid

  • Homozygous -> If an SNP (different than ref) base is counted across the read depth to be more than 80%
  • Hetorozygous -> If an SNP (different than ref) base is counted across the read depth to be less than 80%
  • Sequence/alignment Error -> If an SNP based is counted to be less than 10%
  • This is true of the depth is minimum of 20x

Accuracy in SNP calling

  • Accuracy can be improved from single(Ref vs one sample) to multi samples (Ref vs several samles).
    • However false positives (SNP call) would also increase with more number of samples
  • Possible accuracy by read depth based SNP calling is 85%
  • Possible accuracy by LD (linkage disequilibrium) is >95%
    • Possible only when multi samples are used
    • Software that uses LD for SNP calling is Beagle, IMPUTE2, QCall, MaCH

Plan for SNP calling

  • Assumptions
    • Multiple Genotypes instead of Ref vs One
    • Right combination (contrasting genotype types for specific type ) vs ref.
    • LD based SNP calling
    • Cross check the SNPs against all the 18 genotypes vs contrasting types
  • Filtering
    • Use of LD (Software would estimate this)
    • HapMap data (Go-through, how haplotypic frequency will help in filtering the best SNPs )
    • Deviations from HWE (Estimate the allelic frequencies when HD is estimated and figure out how to estimate the variant SNPs to filter them off)

Pooled_Mapping