Mm9 multiple alignment: Difference between revisions
From genomewiki
Jump to navigationJump to search
Line 6: | Line 6: | ||
<TR> | <TR> | ||
<TH>sequence</TH> | <TH>sequence</TH> | ||
<TH>distance</TH> | <TH>tree<BR>distance</TH> | ||
<TH>genome<BR>size</TH> | |||
<TH>axtChain<BR>minScore</TH> | <TH>axtChain<BR>minScore</TH> | ||
<TH>axtChain<BR>linearGap</TH> | <TH>axtChain<BR>linearGap</TH> | ||
Line 17: | Line 18: | ||
<TH>rat rn4</TH> | <TH>rat rn4</TH> | ||
<TD>0.1587</TD> | <TD>0.1587</TD> | ||
<TD>2,702 Mb</TD> | |||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> | ||
Line 27: | Line 29: | ||
<TH>human hg18</TH> | <TH>human hg18</TH> | ||
<TD>0.4667</TD> | <TD>0.4667</TD> | ||
<TD>2,963 Mb</TD> | |||
<TD>3000</TD> | <TD>3000</TD> | ||
<TD>medium</TD> | <TD>medium</TD> |
Revision as of 21:53, 17 August 2007
To avoid artifacts in downstream processing of the UCSC multiple alignments, it is important to be careful on the use of the parameters used in the blastz processing pipeline. There are a number of steps in the pipeline and a variety of tunable parameters involved. This page will track the various parameters used in the alignments as they proceed toward the completion of a multiple alignment conservation track on the mm9 mouse (NCBI build 37) assembly
axtChain parameters and end results
sequence | tree distance |
genome size |
axtChain minScore |
axtChain linearGap |
% of mm9 matched |
% of other matched by mm9 |
done |
---|---|---|---|---|---|---|---|
rat rn4 | 0.1587 | 2,702 Mb | 3000 | medium | 68.357 | 69.541 | 16 August |
human hg18 | 0.4667 | 2,963 Mb | 3000 | medium | 38.499 | 35.201 | 16 August |
blastz alignment parameters details
target | query | abridged repeats |
target size (overlap) |
query size (overlap) |
H | M |
---|---|---|---|---|---|---|
mm9 | rat rn4 | yes B=0 |
10M (10K) | 10M (0) | 2000 | 40M |
human hg18 | mm9 | yes B=0 |
10M (0) | 10M (10K) | 2000 | 40M |
default blastz parameters
m=80 v=0 B=2 C=0 E=30 G=0 H=0 K=3000 L=K M=0 O=400 P=1 R=0 T=1 W=8 X=10*(A-to-A match score) Y=O+300*E Z=1 From the blastz usage message: Default values are given in parentheses. m(80M) bytes of space for trace-back information v(0) 0: quiet; 1: verbose progress reports to stderr B(2) 0: single strand; >0: both strands C(0) 0: no chaining; 1: just output chain; 2: chain and extend; 3: just output HSPs E(30) gap-extension penalty. G(0) diagonal chaining penalty. H(0) interpolate between alignments at threshold K = argument. K(3000) threshold for MSPs L(K) threshold for gapped alignments M(0) mask any base in seq1 hit this many times; 0 = no dynamic masking O(400) gap-open penalty. P(1) 0: entropy not used; 1: entropy used; >1 entropy with feedback. Q load the scoring matrix from a file. R(0) antidiagonal chaining penalty. T(1) 0: W-bp words; 1: 12of19; 2: 12of19 without transitions. 3: 14of22; 4: 14of22 without transitions. W(8) word size (unused unless T=0) X(10*(A-to-A match score)) X-drop parameter for ungapped extension. Y(O+300E) X-drop parameter for gapped extension. Z(1) increment between successive words in sequence 1.
matrix parameters
The "medium" gap score matrix, tuned for the mouse-human distance is:
tableSize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 tGap 350 425 450 600 900 2900 22900 57900 117900 217900 317900 bothGap 750 825 850 1000 1300 3300 23300 58300 118300 218300 318300
The "loose" gap score matrix, tuned for the chicken-human distance is:
tablesize 11 smallSize 111 position 1 2 3 11 111 2111 12111 32111 72111 152111 252111 qGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 tGap 325 360 400 450 600 1100 3600 7600 15600 31600 56600 bothGap 625 660 700 750 900 1400 4000 8000 16000 32000 57000