Chains Nets: Difference between revisions
From genomewiki
Jump to navigationJump to search
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
chains in a nutshell: | chains in a nutshell: | ||
* a chain is a sequence of gapless aligned blocks, where there must be | * a chain is a sequence of gapless aligned blocks, where there must be no overlaps of blocks' target or query coords within the chain. Within a chain, target and query coords are monotonically non-decreasing. (i.e. always increasing or flat) | ||
no overlaps of blocks' target or query coords within the chain. | * double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed. | ||
Within a chain, target and query coords are monotonically | * not just orthologs, but paralogs too, can result in good chains. but that's useful! | ||
non-decreasing. (i.e. always increasing or flat) | * chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. | ||
* double-sided gaps are a new capability (blastz can't do that) | * chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done. | ||
that allow extremely long chains to be constructed. | |||
* not just orthologs, but paralogs too, can result in good chains. | |||
but that's useful! | |||
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human | |||
chains, and you should get approx. the same chains as if you chain | |||
swapped mouse-human blastz alignments. | |||
* chained blastz alignments are not single-coverage in either target | |||
or query unless some subsequent filtering (like netting) is done. | |||
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). | * chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). | ||
And nets: | And nets: | ||
* a net is a hierarchical collection of chains, with the | * a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have | ||
highest-scoring non-overlapping chains on top, and their gaps filled | |||
in where possible by lower-scoring chains, for several levels. | |||
I think a chain's qName also helps to determine which level it lands | |||
in, i.e. it makes a difference whether a chain's qName is the same | |||
as the top-level chain's qName or not, because the levels have | |||
meanings associated with them -- see details page. | meanings associated with them -- see details page. | ||
* a net is single-coverage for target but not for query. | * a net is single-coverage for target but not for query. | ||
* because it's single-coverage in the target, it's no longer | * because it's single-coverage in the target, it's no longer symmetrical. | ||
symmetrical. | * the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and | ||
* the netter has two outputs, one of which we usually ignore: the | |||
target-centric net in query coordinates. The reciprocal best | |||
process uses that output: the query-referenced (but target-centric / | |||
target single-cov) net is turned back into component chains, and | |||
then those are netted to get single coverage in the query too; | |||
the two outputs of that netting are reciprocal-best in query and | |||
target coords. Reciprocal-best nets are symmetrical again. | target coords. Reciprocal-best nets are symmetrical again. | ||
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level. | * nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level. | ||
Navigation: back to [[Implementation_Notes]] | Navigation: back to [[Implementation_Notes]] |
Revision as of 23:04, 7 April 2006
Chains and nets are Jim Kent's brainchild, published here: [[1]]
They used to be generated by a long manual process documented in some of our older make*.doc files, but are now generated by the script kent/src/utils/doBlastzChainNet.pl .
Here are some musings on chains and nets -- these are from Angie's mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets.
chains in a nutshell:
- a chain is a sequence of gapless aligned blocks, where there must be no overlaps of blocks' target or query coords within the chain. Within a chain, target and query coords are monotonically non-decreasing. (i.e. always increasing or flat)
- double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.
- not just orthologs, but paralogs too, can result in good chains. but that's useful!
- chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.
- chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
- chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).
And nets:
- a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have
meanings associated with them -- see details page.
- a net is single-coverage for target but not for query.
- because it's single-coverage in the target, it's no longer symmetrical.
- the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and
target coords. Reciprocal-best nets are symmetrical again.
- nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
Navigation: back to Implementation_Notes