Minimal Steps For LiftOver: Difference between revisions

From genomewiki
Jump to navigationJump to search
mNo edit summary
(corrected doBlastzChainNet.pl)
 
(14 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.
==2018 UPDATE NOTE==


The following describes the minimal steps needed to create a liftOver chain file  that converts annotations between bacterial genome builds (genome size ~4Mb).
This page is an interesting historical discussion and well worth the read.
 
<b><span style="color:#bb0000">
HOWEVER, please note</span></b>,
the UCSC tool chain commands: [[DoSameSpeciesLiftOver.pl]] and [[DoBlastzChainNet.pl]] can now completely perform
the sequence of events in your environment with your selected genome sequences.
 
==Discussion==
 
See also: [[Whole_genome_alignment_howto]], [[LiftOver_Howto]] and [[Same_species_lift_over_construction]].
 
This doc is a stripped down version of [http://hgwdev.cse.ucsc.edu/~kent/src/unzipped/hg/doc/liftOver.txt /kent/src/hg/doc/liftOver.txt].
 
The following describes the minimal steps needed to create a liftOver chain file  that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds.  More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format.


[1] Split the query (NEW) genome build by FASTA record using faSplit:
[1] Split the query (NEW) genome build by FASTA record using faSplit:
Line 18: Line 31:
</pre></code>
</pre></code>


Setp [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:
(See also [[LiftUp format]])
 
Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:


<code>
<code>
Line 36: Line 51:
</pre></code>
</pre></code>


'''NOTES:''' Not using ooc file becuase genome build is small.  Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.
'''NOTES:''' Not using ooc file because genome build is small.  Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.


[3] Use liftUp to change the coordinate system.  Requires .lft files created in step [1]:
[3] Use liftUp to change the coordinate system.  Requires .lft files created in step [1]:
Line 58: Line 73:
$ chainMergeSort *.chain | chainSplit chain stdin
$ chainMergeSort *.chain | chainSplit chain stdin
</pre></code>
</pre></code>
'''NOTES''': This creates a directory 'chain' whicn contains a chr.chain file.  The OLD build use here only contained one chromosome.  Not sure if axtChain will create one chain file for each target chromosome.
'''NOTES''': This creates a directory 'chain' which contains a chr.chain file.  The OLD build use here only contained one chromosome.  Not sure if axtChain will create one chain file for each target chromosome.


[6] Make alignment nets from chains in [5]:
[6] Make alignment nets from chains in [5]:
Line 64: Line 79:
$ cd chain
$ cd chain
$ mkdir ../net
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null
</pre></code>
</pre></code>
'''NOTES''': This step requires the chrom.sizes files from step [1].
'''NOTES''': This step requires the chrom.sizes files from step [1].
Line 79: Line 94:
$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped
$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped
</pre></code>
</pre></code>
[[Category:User Developed Scripts]]

Latest revision as of 15:18, 26 April 2018

2018 UPDATE NOTE

This page is an interesting historical discussion and well worth the read.

HOWEVER, please note, the UCSC tool chain commands: DoSameSpeciesLiftOver.pl and DoBlastzChainNet.pl can now completely perform the sequence of events in your environment with your selected genome sequences.

Discussion

See also: Whole_genome_alignment_howto, LiftOver_Howto and Same_species_lift_over_construction.

This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.

The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). Because they are the same species, I used BLAT to align the two builds. More divergent sequences should be aligned using blastz. Steps [2] to [4] can be modified to accept the blastz output file format.

[1] Split the query (NEW) genome build by FASTA record using faSplit:

$ faSplit sequence NEW.build 2 chr

NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:

0       chr 3061531 chr      3061531

where columns are:

start seq_name size seq_name size

(See also LiftUp format)

Step [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:

$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes

[2] BLAT query sequences from [1] against the OLD build:

$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap

NOTES: Not using ooc file because genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.

[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:

$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl 
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl 

[4] Chain together alignments from [3] using axtChain:

$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain

NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.

[5] Combine and sort chain files from [4]:

$ chainMergeSort *.chain | chainSplit chain stdin

NOTES: This creates a directory 'chain' which contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.

[6] Make alignment nets from chains in [5]:

$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes ../net/chr.net /dev/null

NOTES: This step requires the chrom.sizes files from step [1].

[7] Create liftOver chain file:

$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain

[8] Use your new liftOver chain file:

$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped