|
|
(One intermediate revision by one other user not shown) |
Line 1: |
Line 1: |
| The following outlines the minimal steps I undertook to make a liftOver chain file to convert annotations between bacterial genome builds (genome size ~4Mb).
| | #REDIRECT [[Minimal Steps For LiftOver]] |
| | |
| [1] Split the query (NEW) genome build by FASTA record using faSplit: | |
| | |
| <code>$ faSplit sequence NEW.build 2 chr</code>
| |
| | |
| NOTES: There are 2 fasta records in the NEW build. Therefore, I used the argument '2' to split the build into two files which each contain one FASTA record, chr0 and chr1. Also need to create .lft files that describe the two sequences. If breaking the sequences into chunks using the 'size' parameter, just use the -lift option. Otherwise, need to make your own .lft files, e.g. chr0.lft contains:
| |
| | |
| <code><pre>
| |
| 0 chr 3061531 chr 3061531
| |
| </pre>
| |
| </code>
| |
| where columns are:
| |
| <code>
| |
| start seq_name size seq_name size
| |
| </code>
| |
| | |
| Also need to make chrom.sizes files what contain the sequence lengths of the FASTA records in the builds:
| |
| | |
| <code>
| |
| <pre>
| |
| $ cd path_to_OLD_build
| |
| $ twoBitInfo OLD.2bit chrom.sizes
| |
| $ cd path_to_NEW_build
| |
| $ twoBitInfo NEW.2bit chrom.sizes
| |
| </pre>
| |
| </code>
| |
| | |
| [2] BLAT query sequences from [1] against the OLD build:
| |
| | |
| <code><pre>
| |
| $ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr0.fa OLD.chr0.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
| |
| $ blat path_to_OLD_build/OLD.2bit path_to_NEW_build/chr1.fa OLD.chr1.psl -tileSize=12 -minScore=100 -minIdentity=98 -fastMap
| |
| </pre></code>
| |
| | |
| NOTES: Not using ooc file becuase genome build is small. Probably a good idea to use a ooc file for larger genomes i.e. > X Mb (?).
| |
| | |
| [3] Use liftUp to change the coordinate system. Requires .lft files created in step [1]:
| |
| | |
| <code><pre>
| |
| $ liftUp -pslQ chr1.psl chr1.lft warn OLD.chr1.psl
| |
| $ liftUp -pslQ chr0.psl chr0.lft warn OLD.chr0.psl
| |
| </pre></code>
| |
| | |
| [4] Chain together alignments from [3] using axtChain:
| |
| | |
| <code><pre>
| |
| $ axtChain -linearGap=medium -psl chr0.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr0.chain
| |
| $ axtChain -linearGap=medium -psl chr1.psl path_to_OLD_build/OLD.2bit path_to_NEW_build/NEW.2bit chr1.chain
| |
| </pre></code>
| |
| | |
| NOTES: Note the -psl argument. This allows axtChain to accept psl as input. I haven't tested this using blastz instead of BLAT. I figure you can convert the lav output from blastz using lavToAxt then use axtChain, ignoring the -psl option.
| |
| | |
| [5] Combine and sort chain files from [4]:
| |
| <code><pre>
| |
| $ chainMergeSort *.chain | chainSplit chain stdin
| |
| </pre></code>
| |
| NOTES: This creates a directory 'chain' whicn contains a chr.chain file. The OLD build use here only contained one chromosome. Not sure if axtChain will create one chain file for each target chromosome.
| |
| | |
| [6] Make alignment nets from chains in [5]:
| |
| <code><pre>
| |
| $ cd chain
| |
| $ mkdir ../net
| |
| $ chainNet chr.chain path_to_OLD_build/chrom.sizes path_to_NEW_build/chrom.sizes $ ../net/chr.net /dev/null
| |
| </pre></code>
| |
| NOTES: This step requires the chrom.sizes files from step [1].
| |
| | |
| [7] Create liftOver chain file: | |
| | |
| <code>
| |
| $ netChainSubset ../net/chr.net chr.chain ../over/chr.chain
| |
| </code>
| |
| | |
| [8] Use your new liftOver chain file:
| |
| | |
| <code>
| |
| $ liftOver to_be_converted.bed ../over/chr.chain conversions.bed unmapped
| |
| </code>
| |