Genscan: Difference between revisions
From genomewiki
Jump to navigationJump to search
(initial contents) |
(the genscan cluster run) |
||
Line 53: | Line 53: | ||
Longest finished job: 6s 0.10m 0.00h 0.00d | Longest finished job: 6s 0.10m 0.00h 0.00d | ||
Submission to last job: 259s 4.32m 0.07h 0.00d | Submission to last job: 259s 4.32m 0.07h 0.00d | ||
==Genscan cluster run== | |||
The gensub2 '''template''': | |||
#LOOP | |||
./runGsBig.csh $(root1) $(lastDir1) {check out exists gtf/$(lastDir1)/$(root1).gtf} {check out exists pep/$(lastDir1)/$(root1).pep} {check out exists subopt/$(lastDir1)/$(root1).bed} | |||
#ENDLOOP | |||
The '''runGsBig.csh''' script: | |||
#!/bin/csh -ef | |||
set chrom = $1 | |||
set dir = $2 | |||
set resultGtf = $3 | |||
set resultPep = $4 | |||
set resultSubopt = $5 | |||
mkdir -p gtf/$dir pep/$dir subopt/$dir | |||
set seqFile = hardMaskedFa/$dir/$chrom.fa | |||
gsBig $seqFile $resultGtf -trans=$resultPep -subopt=$resultSubopt -exe=/scratch/data/genscan/genscan -par=/scratch/data/genscan/HumanIso.smat -tmp=/tmp -window=2400000 | |||
The typical parasol cluster run: | |||
cd /data/genomes/ricCom1/bed/genscan | |||
rm -f file.list | |||
find ./hardMaskedFa -type f > file.list | |||
gensub2 file.list single template jobList | |||
para make jobList | |||
para check | |||
para time > run.time | |||
cat run.time | |||
Completed: 25763 of 25763 jobs | |||
CPU time in finished jobs: 5327s 88.78m 1.48h 0.06d 0.000 y | |||
IO & Wait Time: 70708s 1178.47m 19.64h 0.82d 0.002 y | |||
Average job time: 3s 0.05m 0.00h 0.00d | |||
Longest finished job: 145s 2.42m 0.04h 0.00d | |||
Submission to last job: 478s 7.97m 0.13h 0.01d | |||
Collect results together into single files: | |||
cd /data/genomes/ricCom1/bed/genscan | |||
catDir -r gtf > genscan.gtf | |||
catDir -r pep > genscan.pep | |||
catDir -r subopt > genscanSubopt.bed | |||
==Big Bed track hub files== | |||
cd /data/genomes/ricCom1/bed/genscan | |||
gtfToGenePred genscan.gtf genscan.gp | |||
genePredCheck genscan.gp | |||
genePredToBed genscan.gp genscan.bed | |||
bedToBigBed genscan.bed ../../chrom.sizes genscan.bb | |||
bedToBigBed genscanSubopt.bed ../../chrom.sizes genscanSubopt.bb | |||
==trackDb.txt entry== | |||
Your track hub trackDb.txt stanzas: | |||
track genscan_ | |||
shortLabel Genscan Genes | |||
longLabel Genscan Gene Predictions | |||
group genes | |||
priority 50 | |||
visibility pack | |||
color 170,100,0 | |||
type bigBed 12 . | |||
bigDataUrl bbi/ricCom1.genscan.bb | |||
html ../trackDescriptions/genscan | |||
track genscanSubopt_ | |||
shortLabel Genscan Subopt | |||
longLabel Genscan Suboptimal Exon Predictions | |||
group genes | |||
priority 51 | |||
visibility hide | |||
color 180,90,0 | |||
type bigBed 6 . | |||
bigDataUrl bbi/ricCom1.genscanSubopt.bb |
Revision as of 20:20, 23 April 2013
Hard mask
Genscan works with hard masked sequence. All masked sequence is converted to N's. UCSC runs this as a cluster job, although it is a simple loop if you are willing to wait. The gensub2 template is:
#LOOP ./runOne.csh $(root1) ../hardMaskedFa/$(lastDir1)/$(file1) #ENDLOOP
The runOne.csh script:
#!/bin/csh -ef set chrom = $1 set result = $2 twoBitToFa /data/genomes/ricCom1/ricCom1.2bit:$chrom stdout \ | maskOutFa stdin hard $result
C-shell script syntax here to manually construct a directory hierarchy to contain all the resulting hard masked fasta files and the chr.list file to use with gensub2:
mkdir -p /data/genomes/ricCom1/bed/genscan/run.hardMask cd /data/genomes/ricCom1/bed/genscan/run.hardMask mkdir -p ../hardMaskedFa set perDirLimit = 4000 set ctgCount = `twoBitInfo /data/genomes/ricCom1/ricCom1.2bit stdout | wc -l` set subDirCount = `echo $ctgCount | awk '{printf "%d", 1+$1/4000}'` @ dirCount = 0 set dirName = `echo $dirCount | awk '{printf "%03d", $1}'` @ perDirCount = 0 mkdir ../hardMaskedFa/$dirName /bin/rm -f chr.list /bin/touch chr.list foreach chrom ( `twoBitInfo /data/genomes/ricCom1/ricCom1.2bit stdout | cut -f1` ) if ($perDirCount < $perDirLimit) then @ perDirCount += 1 else @ dirCount += 1 set dirName = `echo $dirCount | awk '{printf "%03d", $1}'` set perDirCount = 1 mkdir ../hardMaskedFa/$dirName endif echo $dirName/$chrom.fa >> chr.list end
The gensub2 constructs the jobList:
gensub2 chr.list single template jobList
Typical parasol cluster run:
para make jobList para check para time > run.time cat run.time Completed: 25763 of 25763 jobs CPU time in finished jobs: 90s 1.50m 0.02h 0.00d 0.000 y IO & Wait Time: 65875s 1097.92m 18.30h 0.76d 0.002 y Average job time: 3s 0.04m 0.00h 0.00d Longest finished job: 6s 0.10m 0.00h 0.00d Submission to last job: 259s 4.32m 0.07h 0.00d
Genscan cluster run
The gensub2 template:
#LOOP ./runGsBig.csh $(root1) $(lastDir1) {check out exists gtf/$(lastDir1)/$(root1).gtf} {check out exists pep/$(lastDir1)/$(root1).pep} {check out exists subopt/$(lastDir1)/$(root1).bed} #ENDLOOP
The runGsBig.csh script:
#!/bin/csh -ef set chrom = $1 set dir = $2 set resultGtf = $3 set resultPep = $4 set resultSubopt = $5 mkdir -p gtf/$dir pep/$dir subopt/$dir set seqFile = hardMaskedFa/$dir/$chrom.fa gsBig $seqFile $resultGtf -trans=$resultPep -subopt=$resultSubopt -exe=/scratch/data/genscan/genscan -par=/scratch/data/genscan/HumanIso.smat -tmp=/tmp -window=2400000
The typical parasol cluster run:
cd /data/genomes/ricCom1/bed/genscan rm -f file.list find ./hardMaskedFa -type f > file.list gensub2 file.list single template jobList para make jobList para check para time > run.time cat run.time Completed: 25763 of 25763 jobs CPU time in finished jobs: 5327s 88.78m 1.48h 0.06d 0.000 y IO & Wait Time: 70708s 1178.47m 19.64h 0.82d 0.002 y Average job time: 3s 0.05m 0.00h 0.00d Longest finished job: 145s 2.42m 0.04h 0.00d Submission to last job: 478s 7.97m 0.13h 0.01d
Collect results together into single files:
cd /data/genomes/ricCom1/bed/genscan catDir -r gtf > genscan.gtf catDir -r pep > genscan.pep catDir -r subopt > genscanSubopt.bed
Big Bed track hub files
cd /data/genomes/ricCom1/bed/genscan gtfToGenePred genscan.gtf genscan.gp genePredCheck genscan.gp genePredToBed genscan.gp genscan.bed bedToBigBed genscan.bed ../../chrom.sizes genscan.bb bedToBigBed genscanSubopt.bed ../../chrom.sizes genscanSubopt.bb
trackDb.txt entry
Your track hub trackDb.txt stanzas:
track genscan_ shortLabel Genscan Genes longLabel Genscan Gene Predictions group genes priority 50 visibility pack color 170,100,0 type bigBed 12 . bigDataUrl bbi/ricCom1.genscan.bb html ../trackDescriptions/genscan
track genscanSubopt_ shortLabel Genscan Subopt longLabel Genscan Suboptimal Exon Predictions group genes priority 51 visibility hide color 180,90,0 type bigBed 6 . bigDataUrl bbi/ricCom1.genscanSubopt.bb