Chains Nets: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
(Added references to the new Blastz page; updated some kent/src paths.)
Line 1: Line 1:
Chains and nets are [[User:Jimkent|Jim Kent]]'s brainchild, published here:
Chains and nets are [[User:Jimkent|Jim Kent]]'s brainchild, published here:
[[http://www.pnas.org/cgi/content/full/100/20/11484 http://www.pnas.org/cgi/content/full/100/20/11484]]
[[http://www.pnas.org/cgi/content/full/100/20/11484 http://www.pnas.org/cgi/content/full/100/20/11484]] They are generated from genomic local alignments computed by [[Blastz]].


They used to be generated by a long manual process documented in some of our older make*.doc files, but are now generated by the script kent/src/utils/doBlastzChainNet.pl .
They used to be generated by a long manual process documented in some of our older makeDb/doc/*.txt files, but are now generated by the script kent/src/hg/utils/automation/doBlastzChainNet.pl .


Here are some musings on the fine points of chains and nets -- these are from [[User:AngieHinrichs|Angie]]'s mental model of chains and nets and represent opinions which may be outdated or plain old incorrect.  The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets.
Here are some musings on the fine points of chains and nets -- these are from [[User:AngieHinrichs|Angie]]'s mental model of chains and nets and represent opinions which may be outdated or plain old incorrect.  The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets.
Line 10: Line 10:
* double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.
* double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.
* not just orthologs, but paralogs too, can result in good chains.  but that's useful!
* not just orthologs, but paralogs too, can result in good chains.  but that's useful!
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.   
* chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments.  However, [[Blastz]]'s dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical.  Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.
* chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.   
* chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.   
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query.  Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).
* chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query.  Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).
Line 21: Line 21:
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
* nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.


"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process.  Same-species liftOver chains are generated by a series of scripts that [[User:Kate|Kate]] wrote, in kent/src/hg/makeDb/makeLoChain/ , and use blat -fastMap as the alignment method.  Cross-species liftOver chains are generated by doBlastzChainNet.pl.
"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process.  Same-species liftOver chains use blat -fastMap as the alignment method, and are generated by kent/src/hg/utils/automation/doSameSpeciesLiftOver.pl, based on a series of scripts that [[User:Kate|Kate]] wrote in kent/src/hg/makeDb/makeLoChain/.  Cross-species liftOver chains are generated by doBlastzChainNet.pl.


Navigation: back to [[Implementation_Notes]]
Navigation: back to [[Implementation_Notes]]

Revision as of 20:19, 14 November 2007

Chains and nets are Jim Kent's brainchild, published here: [http://www.pnas.org/cgi/content/full/100/20/11484] They are generated from genomic local alignments computed by Blastz.

They used to be generated by a long manual process documented in some of our older makeDb/doc/*.txt files, but are now generated by the script kent/src/hg/utils/automation/doBlastzChainNet.pl .

Here are some musings on the fine points of chains and nets -- these are from Angie's mental model of chains and nets and represent opinions which may be outdated or plain old incorrect. The source code, and the results that we get by running these programs on real data, are the ultimate source of truth about chains and nets.

Chains in a nutshell:

  • a chain is a sequence of gapless aligned blocks, where there must be no overlaps of blocks' target or query coords within the chain. Within a chain, target and query coords are monotonically non-decreasing. (i.e. always increasing or flat)
  • double-sided gaps are a new capability (blastz can't do that) that allow extremely long chains to be constructed.
  • not just orthologs, but paralogs too, can result in good chains. but that's useful!
  • chains should be symmetrical -- e.g. swap human-mouse -> mouse-human chains, and you should get approx. the same chains as if you chain swapped mouse-human blastz alignments. However, Blastz's dynamic masking is asymmetrical, so in practice those results are not exactly symmetrical. Also, dynamic masking in conjunction with changed chunk sizes can cause differences in results from one run to the next.
  • chained blastz alignments are not single-coverage in either target or query unless some subsequent filtering (like netting) is done.
  • chain tracks can contain massive pileups when a piece of the target aligns well to many places in the query. Common causes of this include insufficient masking of repeats and high-copy-number genes (or paralogs).

And nets:

  • a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels. I think a chain's qName also helps to determine which level it lands in, i.e. it makes a difference whether a chain's qName is the same as the top-level chain's qName or not, because the levels have meanings associated with them -- see details page.
  • a net is single-coverage for target but not for query.
  • because it's single-coverage in the target, it's no longer symmetrical.
  • the netter has two outputs, one of which we usually ignore: the target-centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
  • nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.

"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process. Same-species liftOver chains use blat -fastMap as the alignment method, and are generated by kent/src/hg/utils/automation/doSameSpeciesLiftOver.pl, based on a series of scripts that Kate wrote in kent/src/hg/makeDb/makeLoChain/. Cross-species liftOver chains are generated by doBlastzChainNet.pl.

Navigation: back to Implementation_Notes