Base Calling

Minimum Read depth
Based on Phred scores
1% error rate
Alignment (Trade off between accuracy and read depth)
Recalibration of Pherd scores
- Essential
- Phred score of Q should be = 10 to the power Q by 10 or less. This is done by alignning with the reference with the known SNPs

Homo and Heterozgous SNPs in a diploid

Homozygous -> If an SNP (different than ref) base is counted across the read depth to be more than 80%
Hetorozygous -> If an SNP (different than ref) base is counted across the read depth to be less than 80%
Sequence/alignment Error -> If an SNP based is counted to be less than 10%
This is true of the depth is minimum of 20x

Accuracy can be improved from single(Ref vs one sample) to multi samples (Ref vs several samles).
- However false positives (SNP call) would also increase with more number of samples
Possible accuracy by read depth based SNP calling is 85%
Possible accuracy by LD (linkage disequilibrium) is >95%
- Possible only when multi samples are used
- Software that uses LD for SNP calling is Beagle, IMPUTE2, QCall, MaCH

Assumptions
- Multiple Genotypes instead of Ref vs One
- Right combination (contrasting genotype types for specific type ) vs ref.
- LD based SNP calling
- Cross check the SNPs against all the 18 genotypes vs contrasting types
Filtering
- Use of LD (Software would estimate this)
- HapMap data (Go-through, how haplotypic frequency will help in filtering the best SNPs )
- Deviations from HWE (Estimate the allelic frequencies when HD is estimated and figure out how to estimate the variant SNPs to filter them off)