Conservation Track
From genomewiki
Jump to navigationJump to search
Conservation Track Implementation Notes
1) Track Components: Tables
multizNway: scored ref, index into maf files (via extFile) multizNwaySummary: added to improve performance when the display is > 1 million bases multizNwayFrames: Mark D's codon frames, Brian R's gap annotation phastConsNway: wiggle, one score per base in genome. provides index into wib file. based on percent (0..1)
2) Track Components: Files
* Display: /gbdb/<db>/multizNway/*.maf (multiz table uses this file) /gbdb/<db>/phastConsNway/*.wib (phastCons table uses this file)
* Downloads: goldenPath/<db>/multizNway/chr*.maf goldenPath/<db>/multizNway/upstream*.maf goldenPath/<db>/phastConsNway/* (compressed, per chrom)
3) Track Components: TrackDb
* Required: type wigMaf (track type) wiggle (wiggle table)
* Optional: speciesOrder (this is the order that the species will appear on the track control page and in the browser -- should be in phylo order) speciesGroups (these are the groups into which the species are split (e.g. vertebrate, mammals)) summary (points to multizXwaySummary table) frames (points to multizXwayFrames table)
4) Most Conserved Track
* Table: phastConsNwayElements (BED of scored elements)
* Files: NONE
5) Track Construction: Overview
1. Create single-coverage pairwise alignments (axtNet) 2. Create multiple alignment 3. Generate conservation scores and conserved elements (phastCons) 4. Add gap annotation to multiple alignment (Brian R's gap annotation software) 5. Create multiple alignment summary 6. Create frame tables for multiple alignment
6) Pairwise Alignments: Procedure
1. Blastz Alignment (blastz, lavToPsl) (this generates a set of alignments in psl (these are close enough so that you can swap species1 <-> species2)) 2. Chaining (axtChain, chainMergeSort, chainAntiRepeat) 3. Netting (chainNet, netFilter) 4. Extraction of single-coverage alignments from the net (netToAxt) (net chooses single best chain for Level 1) (can't simply swap nets like you can chains) (feed netAxt into MULTIZ)
* All automated by doBlastzChainNet.pl (Thanks, Angie!!)
7) Pairwise Alignments: Parameters
Blastz scoring matrix (this is the $matrix that shows up on the chain description page) Blastz gap penalties, misc Lineage-specific repeat abridging (give BLASTZ masked sequence, BLASTZ aviods starting in a repeat, but will continue through one) Chaining min score, linear gap
8) Multiple Alignment
* Inputs: 1. Single-coverage pairwise alignments 2. Species tree (phastCons "make tree")
* Aligner: multiz (with autoMZ driver) (feed it the tree, and it does the multiple alignment) or TBA (Threaded Blockset Aligner) (ENCODE uses this)
9) Conservation Scoring with PhastCons (Adam S's phylogenetic HMM)
* Inputs: Multiple alignment Species tree with branch lengths (optionally two trees)
* Parameters: rho, expected-len, target-coverage
* Output: Per-base probability Conserved elements
(our goal is to get 5% of genome in conserved elements -- the params are tweaked until we hit this)
10) Multiple Alignment Summary and Annotations
Gap Annotation (mafAddIRows) Summary table (hgLoadMafSummary) Coding frames (getFrames, etc.)