Same species lift over construction: Difference between revisions
(add description of this step 1 procedure) |
(add category tags) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==2018 UPDATE NOTE== | |||
This page is an interesting historical discussion and well worth the read. | |||
<b><span style="color:#bb0000"> | |||
HOWEVER, please note</span></b>, | |||
the UCSC tool chain command: [[DoSameSpeciesLiftOver.pl]] can now perform this entire sequence of events | |||
in your environment with your selected genome sequences. | |||
=Same Species Lift Over= | =Same Species Lift Over= | ||
Same species lift over procedure. There is a different procedure for different species lift over construction: [[Whole_genome_alignment_howto]] | Same species lift over procedure. There is a different procedure for different species lift over construction: [[Whole_genome_alignment_howto]] | ||
Line 12: | Line 21: | ||
uses [[File:BlatJob.csh.txt]] | uses [[File:BlatJob.csh.txt]] | ||
The 'sameSpeciesBlatSetup.sh' script partitions each genome sequence into single parts, expecting small genomes | The ''sameSpeciesBlatSetup.sh'' script partitions each genome sequence into single parts, expecting small genomes | ||
with chromosome sizes less than 10,000,000 bases each so that no chromosome is broken into separate | with chromosome sizes less than 10,000,000 bases each so that no chromosome is broken into separate | ||
parts. The two resulting parts lists are matched together so that each part from each genome is matched | parts. The two resulting parts lists are matched together so that each part from each genome is matched | ||
with every other part of the other genome. This generates a job-list shell script for each of those parings. | with every other part of the other genome. This generates a job-list shell script for each of those parings. | ||
The BlatJob.csh script runs the blat for each pair to generate PSL file results which are used in the chain/net step. | The ''BlatJob.csh'' script runs the blat for each pair to generate PSL file results which are used in the chain/net step. | ||
=Step 2 - chain/net= | =Step 2 - chain/net= | ||
The second step is encapsulated in the script: [[File:SameSpeciesChainNet.sh.txt]] | The second step is encapsulated in the script: [[File:SameSpeciesChainNet.sh.txt]] | ||
This script uses the PSL files from the blat runs, chaining and netting to obtain a single | |||
coverage lift over file. | |||
Similar setup as for Step 1, working in the run.chain directory, constructs the result file. | Similar setup as for Step 1, working in the run.chain directory, constructs the result file. | ||
Read the script for information on how to use it and where the result is. | Read the script for information on how to use it and where the result is. | ||
[[Category:Cluster FAQ]] | |||
[[Category:Technical FAQ]] |
Latest revision as of 15:06, 26 April 2018
2018 UPDATE NOTE
This page is an interesting historical discussion and well worth the read.
HOWEVER, please note, the UCSC tool chain command: DoSameSpeciesLiftOver.pl can now perform this entire sequence of events in your environment with your selected genome sequences.
Same Species Lift Over
Same species lift over procedure. There is a different procedure for different species lift over construction: Whole_genome_alignment_howto See also: Minimal_Steps_For_LiftOver
Prerequisites
These procedures are unix command line shell scripted operations. Users should be familiar with unix command line shell programming and have the kent source tree available to build the necessary kent programs.
Step 1 - blat
Decide on a workDirectory where all this is to be performed. Note the script: File:SameSpeciesBlatSetup.sh.txt and the comments in it to setup the workDirectory and your two genome sequences. That script also uses File:BlatJob.csh.txt
The sameSpeciesBlatSetup.sh script partitions each genome sequence into single parts, expecting small genomes with chromosome sizes less than 10,000,000 bases each so that no chromosome is broken into separate parts. The two resulting parts lists are matched together so that each part from each genome is matched with every other part of the other genome. This generates a job-list shell script for each of those parings. The BlatJob.csh script runs the blat for each pair to generate PSL file results which are used in the chain/net step.
Step 2 - chain/net
The second step is encapsulated in the script: File:SameSpeciesChainNet.sh.txt This script uses the PSL files from the blat runs, chaining and netting to obtain a single coverage lift over file.
Similar setup as for Step 1, working in the run.chain directory, constructs the result file. Read the script for information on how to use it and where the result is.