Minimal Steps For LiftOver: Difference between revisions
(added link to doc file) |
(corrected doBlastzChainNet.pl) |
||
(9 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
This doc is a stripped down version of [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt | ==2018 UPDATE NOTE== | ||
This page is an interesting historical discussion and well worth the read. | |||
<b><span style="color:#bb0000"> | |||
HOWEVER, please note</span></b>, | |||
the UCSC tool chain commands: [[DoSameSpeciesLiftOver.pl]] and [[DoBlastzChainNet.pl]] can now completely perform | |||
the sequence of events in your environment with your selected genome sequences. | |||
==Discussion== | |||
See also: [[Whole_genome_alignment_howto]], [[LiftOver_Howto]] and [[Same_species_lift_over_construction]]. | |||
This doc is a stripped down version of [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt /kent/src/hg/doc/liftOver.txt]. | |||
The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format. | The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format. | ||
Line 20: | Line 33: | ||
(See also [[LiftUp format]]) | (See also [[LiftUp format]]) | ||
Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds: | |||
<code> | <code> | ||
Line 38: | Line 51: | ||
</pre></code> | </pre></code> | ||
'''NOTES:''' Not using ooc file | '''NOTES:''' Not using ooc file because genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb. | ||
[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]: | [3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]: | ||
Line 60: | Line 73: | ||
$ chainMergeSort *.chain | chainSplit chain stdin | $ chainMergeSort *.chain | chainSplit chain stdin | ||
</pre></code> | </pre></code> | ||
'''NOTES''': This creates a directory 'chain' | '''NOTES''': This creates a directory 'chain' which contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome. | ||
[6] Make alignment nets from chains in [5]: | [6] Make alignment nets from chains in [5]: | ||
Line 66: | Line 79: | ||
$ cd chain | $ cd chain | ||
$ mkdir ../net | $ mkdir ../net | ||
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes | $ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null | ||
</pre></code> | </pre></code> | ||
'''NOTES''': This step requires the chrom.sizes files from step [1]. | '''NOTES''': This step requires the chrom.sizes files from step [1]. |
Latest revision as of 15:18, 26 April 2018
2018 UPDATE NOTE
This page is an interesting historical discussion and well worth the read.
HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.
Discussion
See also: Whole_genome_alignment_howto, LiftOver_Howto and Same_species_lift_over_construction.
This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.
The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format.
[1] Split the query (NEW) genome build by FASTA record using faSplit:
$ faSplit sequence NEW.build 2 chr
NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:
0 chr 3061531 chr 3061531
where columns are:
start seq_name size seq_name size
(See also LiftUp format)
Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:
$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes
[2] BLAT query sequences from [1] against the OLD build:
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
NOTES: Not using ooc file because genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.
[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:
$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl
[4] Chain together alignments from [3] using axtChain:
$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain
NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.
[5] Combine and sort chain files from [4]:
$ chainMergeSort *.chain | chainSplit chain stdin
NOTES: This creates a directory 'chain' which contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.
[6] Make alignment nets from chains in [5]:
$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null
NOTES: This step requires the chrom.sizes files from step [1].
[7] Create liftOver chain file:
$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain
[8] Use your new liftOver chain file:
$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped