Conservation Track: Difference between revisions
From genomewiki
Jump to navigationJump to search
No edit summary |
(added my notes from Kate's talk) |
||
Line 3: | Line 3: | ||
1) Track Components: Tables | 1) Track Components: Tables | ||
multizNway | multizNway: scored ref, index into maf files (via extFile) | ||
multizNwaySummary | multizNwaySummary: added to improve performance when the display is > 1 million bases | ||
multizNwayFrames | multizNwayFrames: Mark D's codon frames, Brian R's gap annotation | ||
phastConsNway | phastConsNway: wiggle, one score per base in genome. provides index into wib file. based on percent (0..1) | ||
Line 12: | Line 12: | ||
* Display: | * Display: | ||
/gbdb/<db>/multizNway/*.maf | /gbdb/<db>/multizNway/*.maf (multiz table uses this file) | ||
/gbdb/<db>/phastConsNway/*.wib | /gbdb/<db>/phastConsNway/*.wib (phastCons table uses this file) | ||
* Downloads: | * Downloads: | ||
goldenPath/<db>/multizNway/chr*.maf | goldenPath/<db>/multizNway/chr*.maf | ||
goldenPath/<db>/multizNway/upstream*.maf | goldenPath/<db>/multizNway/upstream*.maf | ||
goldenPath/<db>/phastConsNway/* | goldenPath/<db>/phastConsNway/* (compressed, per chrom) | ||
3) Track Components: TrackDb | 3) Track Components: TrackDb | ||
* Required: | * Required: | ||
type wigMaf | type wigMaf (track type) | ||
wiggle | wiggle (wiggle table) | ||
* Optional: | * Optional: | ||
speciesOrder | speciesOrder (this is the order that the species will appear on the track control page and in the browser -- should be in phylo order) | ||
speciesGroups | speciesGroups (these are the groups into which the species are split (e.g. vertebrate, mammals)) | ||
summary | summary (points to multizXwaySummary table) | ||
frames | frames (points to multizXwayFrames table) | ||
4) Most Conserved Track | 4) Most Conserved Track | ||
* Table: | * Table: | ||
phastConsNwayElements | phastConsNwayElements (BED of scored elements) | ||
* Files: | * Files: | ||
Line 43: | Line 43: | ||
1. Create single-coverage pairwise alignments (axtNet) | 1. Create single-coverage pairwise alignments (axtNet) | ||
2. Create multiple alignment | 2. Create multiple alignment | ||
3. Generate conservation scores and conserved elements | 3. Generate conservation scores and conserved elements (phastCons) | ||
4. Add gap annotation to multiple alignment | 4. Add gap annotation to multiple alignment (Brian R's gap annotation software) | ||
5. Create multiple alignment summary | 5. Create multiple alignment summary | ||
6. Create frame tables for multiple alignment | 6. Create frame tables for multiple alignment | ||
Line 51: | Line 51: | ||
6) Pairwise Alignments: Procedure | 6) Pairwise Alignments: Procedure | ||
1. Blastz Alignment (blastz, lavToPsl) | 1. Blastz Alignment (blastz, lavToPsl) (this generates a set of alignments in psl (these are close enough so that you can swap species1 <-> species2)) | ||
2. Chaining (axtChain, chainMergeSort, chainAntiRepeat) | 2. Chaining (axtChain, chainMergeSort, chainAntiRepeat) | ||
3. Netting (chainNet, netFilter) | 3. Netting (chainNet, netFilter) | ||
4. Extraction of single-coverage alignments from the net (netToAxt) | 4. Extraction of single-coverage alignments from the net (netToAxt) (net chooses single best chain for Level 1) (can't simply swap nets like you can chains) (feed netAxt into MULTIZ) | ||
* All automated by doBlastzChainNet.pl | * All automated by doBlastzChainNet.pl | ||
Line 62: | Line 62: | ||
7) Pairwise Alignments: Parameters | 7) Pairwise Alignments: Parameters | ||
Blastz scoring matrix | Blastz scoring matrix (this is the $matrix that shows up on the chain description page) | ||
Blastz gap penalties, misc | Blastz gap penalties, misc | ||
Lineage-specific repeat abridging | Lineage-specific repeat abridging (give BLASTZ masked sequence, BLASTZ aviods starting in a repeat, but will continue through one) | ||
Chaining min score, linear gap | Chaining min score, linear gap | ||
Line 72: | Line 72: | ||
* Inputs: | * Inputs: | ||
1. Single-coverage pairwise alignments | 1. Single-coverage pairwise alignments | ||
2. Species tree | 2. Species tree (phastCons "make tree") | ||
* Aligner: | * Aligner: | ||
multiz (with autoMZ driver) or | multiz (with autoMZ driver) (feed it the tree, and it does the multiple alignment) | ||
TBA (Threaded Blockset Aligner) | or | ||
TBA (Threaded Blockset Aligner) (ENCODE uses this) | |||
9) Conservation Scoring with PhastCons | 9) Conservation Scoring with PhastCons (Adam S's phylogenetic HMM) | ||
* Inputs: | * Inputs: | ||
Line 92: | Line 93: | ||
Conserved elements | Conserved elements | ||
(our goal is to get 5% of genome in conserved elements -- the params are tweaked until we hit this) | |||
10) Multiple Alignment Summary and Annotations | 10) Multiple Alignment Summary and Annotations |
Revision as of 19:15, 1 August 2006
Conservation Track Implementation Notes
1) Track Components: Tables
multizNway: scored ref, index into maf files (via extFile) multizNwaySummary: added to improve performance when the display is > 1 million bases multizNwayFrames: Mark D's codon frames, Brian R's gap annotation phastConsNway: wiggle, one score per base in genome. provides index into wib file. based on percent (0..1)
2) Track Components: Files
* Display: /gbdb/<db>/multizNway/*.maf (multiz table uses this file) /gbdb/<db>/phastConsNway/*.wib (phastCons table uses this file)
* Downloads: goldenPath/<db>/multizNway/chr*.maf goldenPath/<db>/multizNway/upstream*.maf goldenPath/<db>/phastConsNway/* (compressed, per chrom)
3) Track Components: TrackDb
* Required: type wigMaf (track type) wiggle (wiggle table)
* Optional: speciesOrder (this is the order that the species will appear on the track control page and in the browser -- should be in phylo order) speciesGroups (these are the groups into which the species are split (e.g. vertebrate, mammals)) summary (points to multizXwaySummary table) frames (points to multizXwayFrames table)
4) Most Conserved Track
* Table: phastConsNwayElements (BED of scored elements)
* Files: NONE
5) Track Construction: Overview
1. Create single-coverage pairwise alignments (axtNet) 2. Create multiple alignment 3. Generate conservation scores and conserved elements (phastCons) 4. Add gap annotation to multiple alignment (Brian R's gap annotation software) 5. Create multiple alignment summary 6. Create frame tables for multiple alignment
6) Pairwise Alignments: Procedure
1. Blastz Alignment (blastz, lavToPsl) (this generates a set of alignments in psl (these are close enough so that you can swap species1 <-> species2)) 2. Chaining (axtChain, chainMergeSort, chainAntiRepeat) 3. Netting (chainNet, netFilter) 4. Extraction of single-coverage alignments from the net (netToAxt) (net chooses single best chain for Level 1) (can't simply swap nets like you can chains) (feed netAxt into MULTIZ)
* All automated by doBlastzChainNet.pl (Thanks, Angie!!)
7) Pairwise Alignments: Parameters
Blastz scoring matrix (this is the $matrix that shows up on the chain description page) Blastz gap penalties, misc Lineage-specific repeat abridging (give BLASTZ masked sequence, BLASTZ aviods starting in a repeat, but will continue through one) Chaining min score, linear gap
8) Multiple Alignment
* Inputs: 1. Single-coverage pairwise alignments 2. Species tree (phastCons "make tree")
* Aligner: multiz (with autoMZ driver) (feed it the tree, and it does the multiple alignment) or TBA (Threaded Blockset Aligner) (ENCODE uses this)
9) Conservation Scoring with PhastCons (Adam S's phylogenetic HMM)
* Inputs: Multiple alignment Species tree with branch lengths (optionally two trees)
* Parameters: rho, expected-len, target-coverage
* Output: Per-base probability Conserved elements
(our goal is to get 5% of genome in conserved elements -- the params are tweaked until we hit this)
10) Multiple Alignment Summary and Annotations
Gap Annotation (mafAddIRows) Summary table (hgLoadMafSummary) Coding frames (getFrames, etc.)