Blat Scripts: Difference between revisions

From genomewiki
Jump to navigationJump to search
mNo edit summary
(parseBlatOutput.pl was broken because it uses screen scraping - since someone fixed a tag-order bug in hgBlat, it couldn't find the end alignment lines in the html page fetched by botBlat.pl.)
Line 16: Line 16:




[[Media:parseBlatOutput_zip.txt|parseBlatOutput_zip.txt]]: After downloading this file, change the name so that it has a zip file extension e.g. parseBlatOutput.zip and then it can be unzipped. This script parses html output from the BlatBot.pl script and produces either psl output or hyperlinks depending on the BlatBot output type.
[[Media:ParseBlatOutput.zip|ParseBlatOutput.zip]]: download this file and unzip it. This script parses html output from the BlatBot.pl script and produces either psl output or hyperlinks depending on the BlatBot output type.
usage: parseBlat.pl <output type> <html output> [other html outputs...]
usage: parseBlatOutput.pl <output type> <html output> [other html outputs...]
         output type is psl or hyperlink
         output type is psl or hyperlink
         <html output> - file with html returned from blat request
         <html output> - file with html returned from blat request

Revision as of 20:56, 13 September 2011

Here is a collection of Blat-related Perl scripts that perform functions that are frequently requested on the genome mailing list. If anyone finds a problem with these scripts then please notify me by selecting the e-mail user link from the side menu bar at: User:Hartera

BlatBot.pl: This is a script that takes a file of FASTA format sequences as input and then submits them to the web-based Blat on the UCSC Genome Browser web site. It obeys the site rules for the number of frequency of hits i.e. Program-driven use of the Genome Browser software is limited to a maximum of one hit every 15 seconds and Blats sequences in batches of 25 sequences at a time.

The script usage is: usage: BlatBot.pl <organism> <db> <searchType> <sortOrder> <input FASTA> <outputType> <output file>

       Specify organism using the common name with first letter capitalized.
       e.g. Human, Mouse, Rat etc.
       Db is database or assembly name e.g hg17, mm5, rn3 etc.
       searchType can be BLATGuess, DNA, RNA, transDNA or transRNA
       sortOrder can be query,score; query,start; chrom,score;
       chrom,start; score.
       outputType can be pslNoHeader, psl or hyperlink.
       blats will be run in groups of 25 sequences, all
       output going to the specified output file.


ParseBlatOutput.zip: download this file and unzip it. This script parses html output from the BlatBot.pl script and produces either psl output or hyperlinks depending on the BlatBot output type. usage: parseBlatOutput.pl <output type> <html output> [other html outputs...]

       output type is psl or hyperlink
       <html output> - file with html returned from blat request
       [other html outputs...] - more html file results
       output is to stdout