Minimal Steps For LiftOver: Difference between revisions
No edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
The following | This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt. | ||
The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb). | |||
[1] Split the query (NEW) genome build by FASTA record using faSplit: | [1] Split the query (NEW) genome build by FASTA record using faSplit: | ||
Line 5: | Line 7: | ||
<code><pre>$ faSplit sequence NEW.build 2 chr</pre></code> | <code><pre>$ faSplit sequence NEW.build 2 chr</pre></code> | ||
NOTES: There are 2 fasta records in the NEW build | '''NOTES''': There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains: | ||
<code><pre> | <code><pre> | ||
Line 16: | Line 18: | ||
</pre></code> | </pre></code> | ||
Setp [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds: | |||
<code> | <code> | ||
Line 34: | Line 36: | ||
</pre></code> | </pre></code> | ||
NOTES: Not using ooc file becuase genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X | '''NOTES:''' Not using ooc file becuase genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb. | ||
[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]: | [3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]: | ||
Line 50: | Line 52: | ||
</pre></code> | </pre></code> | ||
NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option. | '''NOTES''': Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option. | ||
[5] Combine and sort chain files from [4]: | [5] Combine and sort chain files from [4]: | ||
Line 56: | Line 58: | ||
$ chainMergeSort *.chain | chainSplit chain stdin | $ chainMergeSort *.chain | chainSplit chain stdin | ||
</pre></code> | </pre></code> | ||
NOTES: This creates a directory 'chain' whicn contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome. | '''NOTES''': This creates a directory 'chain' whicn contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome. | ||
[6] Make alignment nets from chains in [5]: | [6] Make alignment nets from chains in [5]: | ||
Line 64: | Line 66: | ||
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null | $ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null | ||
</pre></code> | </pre></code> | ||
NOTES: This step requires the chrom.sizes files from step [1]. | '''NOTES''': This step requires the chrom.sizes files from step [1]. | ||
[7] Create liftOver chain file: | [7] Create liftOver chain file: |
Revision as of 15:26, 10 June 2007
This doc is a stripped down version of /kent/src/hg/doc/liftOver.txt.
The following describes the minimal steps needed to create a liftOver chain file that converts annotations between bacterial genome builds (genome size ~4Mb).
[1] Split the query (NEW) genome build by FASTA record using faSplit:
$ faSplit sequence NEW.build 2 chr
NOTES: There are 2 fasta records in the NEW build, therefore, I used the argument '2' to split the build into two files. Each file contains one FASTA record, chr0 and chr1. Also need to create .lft files (for step [3]) that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files. For example, chr0.lft contains:
0 chr 3061531 chr 3061531
where columns are:
start seq_name size seq_name size
Setp [6] requires chrom.sizes files that contain the sequence lengths of the FASTA records in the builds:
$ cd path_to_OLD_build
$ twoBitInfo OLD.2bit chrom.sizes
$ cd path_to_NEW_build
$ twoBitInfo NEW.2bit chrom.sizes
[2] BLAT query sequences from [1] against the OLD build:
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
$ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
NOTES: Not using ooc file becuase genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X(?) Mb.
[3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:
$ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl
$ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl
[4] Chain together alignments from [3] using axtChain:
$ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
$ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain
NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.
[5] Combine and sort chain files from [4]:
$ chainMergeSort *.chain | chainSplit chain stdin
NOTES: This creates a directory 'chain' whicn contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.
[6] Make alignment nets from chains in [5]:
$ cd chain
$ mkdir ../net
$ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null
NOTES: This step requires the chrom.sizes files from step [1].
[7] Create liftOver chain file:
$ netChainSubset ../net/chr.net chr.chain ../over/chr.chain
[8] Use your new liftOver chain file:
$ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped