Identification and analysis of functional elements
Jump to navigation
Jump to search
This paper is about visualizing and analyzing functional landscape of human genome.
The Encyclopedia of DNA Elements (ENCODE) Project9 aims to provide a more biologically informative representation of the human genome by using high-throughput methods to identify and catalogue. the functional elements encoded 30 groups involved 30MB functional seq analised as 45 regions 15MB known 14 regions; 15MB of unclassified 30 regions
Salient findings
- Human genome extensively transcribed
- Many non-coding transcripts - identified (miRNA?)
- Many unrecogonized transcription start sites - identified
- Regulatory regions are symmetrically distributed around transcription start sites
- Replication timing is correlated with chromatin structure(?!)
- 5% of the functional seq is always with the evolutionary constraits (under negative selection)
- 95% of the functional elements are evolutionarily unconstrained, potentially acting like a warehouse for natural selection
- Functional elements show great sequence variation among them
Transcription
GENCODE: Integrated annotations of both manual review and experimental testing procedures of cDNA and proteins
- Presence of large number of unannotated transcription elements
- validated by RT-PCR (40%)
- RACE extension of Tx.fragments to GENCODE annotated genes is usually 50-200kb
- presence of pseudo genes
- presence of non protein coding RNA
- Primery transcripts
- Coverage of all 3 technologies (ENCODE, RACE and PET Tags) across encode region shows maximum coverage compared to individual technology.
- Regulation of transcripts
- used various methods to identify the regulatory elements and made a transcription start site catalogue
- different catagories of TSS
- Replication
- Chromatin organisation
Evolutionary constraint and population variability
- Data
- 206MB of orthologous sequence to ENCODE from 14 mammalian species
- Sequencing by targetted and isolating strategy of individual BACs
- TBA94, MAVID95 and MLAGAN alignment
- GERP87, SCONE98 and BinCons used to identify the sequences under constraint
- Intra-specific variation is by SNP data
- constrained vs non-constrained
- examined measures of human variation (heterozygosity, derived allele-frequency spectra and indel rates) within the sequences of the experimentally identified functional elements
- small portion of constrained seq, most of them (32%) are coding sequences and 40% of them are un -annotated sequences
- examined measures of human variation (heterozygosity, derived allele-frequency spectra and indel rates) within the sequences of the experimentally identified functional elements
- Experimentally identified functional elements and genetic variation
- within constrained seq, coding show exceesive polymorphism
- In general, non-coding seq, show excessive polymorphism
- Unexplained constrained sequences
- 40% of the ENCODE-region sequences identified as constrained are not associated with any experimental evidence of function.
- Unconstrained experimentally identified functional elements
- unexpectedly large fraction of experimentally identified functional elements show no evidence of evolutionary constraint ranging from 93% for Un.TxFrags to 12% for CDS.
Hypothesis for presence of unconstrained func.Elements
- Presence of miRNA: parent transcript of intronic miRNA harbours the constrained bases
- transcription of intergenic regions or specific factor binding
- general—the presence of neutral (or near neutral) biochemical elements, of lineagespecific functional elements, and of functionally conserved but non-orthologous elements