Chains Nets: Difference between revisions
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
chains in a nutshell: | chains in a nutshell: | ||
* a chain is a sequence of gapless aligned blocks, where there must be | |||
no overlaps of blocks' target or query coords within the chain. | |||
Within a chain, target and query coords are monotonically | |||
non-decreasing. (i.e. always increasing or flat) | |||
* double-sided gaps are a new capability (blastz can't do that) | |||
that allow extremely long chains to be constructed. | |||
* not just orthologs, but paralogs too, can result in good chains. | |||
but that's useful! | |||
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human | |||
chains, and you should get approx. the same chains as if you chain | |||
swapped mouse-human blastz alignments. | |||
* chained blastz alignments are not single-coverage in either target | |||
or query unless some subsequent filtering (like netting) is done. | |||
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs). | |||
And nets: | And nets: | ||
* a net is a hierarchical collection of chains, with the | |||
highest-scoring non-overlapping chains on top, and their gaps filled | |||
in where possible by lower-scoring chains, for several levels. | |||
I think a chain's qName also helps to determine which level it lands | |||
in, i.e. it makes a difference whether a chain's qName is the same | |||
as the top-level chain's qName or not, because the levels have | |||
meanings associated with them -- see details page. | |||
* a net is single-coverage for target but not for query. | |||
* because it's single-coverage in the target, it's no longer | |||
symmetrical. | |||
* the netter has two outputs, one of which we usually ignore: the | |||
target-centric net in query coordinates. The reciprocal best | |||
process uses that output: the query-referenced (but target-centric / | |||
target single-cov) net is turned back into component chains, and | |||
then those are netted to get single coverage in the query too; | |||
the two outputs of that netting are reciprocal-best in query and | |||
target coords. Reciprocal-best nets are symmetrical again. | |||
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level. | |||
Navigation: back to [[Implementation_Notes]] |
Revision as of 23:02, 7 April 2006
Chains and nets are Jim Kent's brainchild, published here: [[1]]
They used to be generated by a long manual process documented in some of our older make*.doc files, but are now generated by the script kent/src/utils/doBlastzChainNet.pl .
Here are some musings on chains and nets -- these are from Angie's mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets.
chains in a nutshell:
- a chain is a sequence of gapless aligned blocks, where there must be
no overlaps of blocks' target or query coords within the chain. Within a chain, target and query coords are monotonically non-decreasing. (i.e. always increasing or flat)
- double-sided gaps are a new capability (blastz can't do that)
that allow extremely long chains to be constructed.
- not just orthologs, but paralogs too, can result in good chains.
but that's useful!
- chains should be symmetrical -- e.g. swap human-mouse -> mouse-human
chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.
- chained blastz alignments are not single-coverage in either target
or query unless some subsequent filtering (like netting) is done.
- chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).
And nets:
- a net is a hierarchical collection of chains, with the
highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page.
- a net is single-coverage for target but not for query.
- because it's single-coverage in the target, it's no longer
symmetrical.
- the netter has two outputs, one of which we usually ignore: the
target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
- nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
Navigation: back to Implementation_Notes