GBiB: From download to BLAT at assembly hubs
From genomewiki
Revision as of 20:01, 6 May 2015 by David da Silva Pires (talk | contribs) (First version of "Blat configuration" section.)
GBiB installation
- Create a folder at your machine to place the installation files:
$> sudo mkdir /usr/local/src/gbib
- Log in at UCSC Genome Browser virtual store:
- Genome Store
- Click in "Add to cart" at the box relative to GBiB.
- Click in "My products" on menu.
- Note the download address.
- Download GBiB to /usr/local/src/gbib:
$> sudo wget https://genome-store.ucsc.edu/media/products/gbib.zip
- Uncompress and delete gbib.zip:
$> unzip gbib.zip $> rm gbib.zip
- Start VirtualBox:
$> sudo virtualbox &
- Add GBiB to VirtualBox:
- Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
- Wait while the first update is done.
- Close GBiB terminal window.
- Select "Send the shutdown signal".
- Confirm by clicking "OK".
GBiB Configuration
- Click at "Settings".
- General ---> Advanced ---> Drag'n'Drop: Bidirectional.
- General ---> Description: Schistosoma mansoni genome assembly and track hubs.
- System ---> Motherboard ---> Base Memory: 4.096 MB.
- System ---> Processor ---> Processor(s): 2.
- Display ---> Video ---> Video Memory: 32 MB.
- Shared Folders ---> + ---> Folder Path: /usr/local/src/gbib/hub/ ---> Auto-mount ---> OK.
- Boot GBiB virtual machine:
- Select "browserbox" on menu at left.
- Click at "Start".
- Test if everything is working at the following URLs:
- Login using ssh, for a faster access.
- Open a terminal, like "konsole".
- Password: browser
$> ssh browser@localhost -p 1235
- Install tools that allows file manipulations:
$> gbibAddTools
- Turn off every kind of automatic update:
$> gbibAutoUpdateOff
- Do not allow users to mirror tracks:
$> gbibMirrorTracksOff
- Turn on the offline mode:
$> gbibOffline
- Reboot the virtual machine
$> sudo shutdown -r now
Assembly hub configuration
- Log in again using ssh:
$> ssh browser@localhost -p 1235
- Create the directories that will store the assembly hub configuration files:
$> mkdir -p /folders/sf_hubs/geneNetwork/schMan2
- Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
- Insert "udcTimeout=1&" right after http://genome.ucsc.edu/cgi-bin/hgTracks? at URL.
- To disable this feature, click at "clear" on the message that appears at the top of the page.
- Fill the contents of hub.txt file (shortlabel <= 17 chars, longlabel <= 80 chars):
$> cat > /usr/local/src/gbib/hubs/geneNetwork/hub.txt << EOI hub geneNetwork shortlabel Gene Network longlabel Gene Network Hub for Schistosoma mansoni genomesFile genomes.txt email admin-gene@iq.usp.br descriptionUrl geneNetwork.html EOI
- Fill the contents of genomes.txt:
$> cat > /usr/local/src/gbib/hubs/geneNetwork/genomes.txt << EOI genome schMan2 trackDb schMan2/trackDb.txt twoBitPath schMan2/schMan2.2bit groups schMan2/groups.txt description Dec. 2011 (Sanger 5.2) organism Schistosoma mansoni defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754 orderKey 2 htmlPath schMan2/description.html scientificName Schistosoma mansoni blat 127.0.0.1 42422 transBlat 127.0.0.1 42423 EOI
- Verify if everything is OK whith the hub:
$> hubPublickCheck hubPublic -addHub="/folders/sf_hub/geneNetwork/hub.txt"
- If the above command works, you will get the MySQL command that could be executed to insert the hub at the public hub table. For example:
mysql> insert into hubPublic (hubUrl,descriptionUrl,shortLabel,longLabel,registrationTime,dbCount,dbList) values ("/folders/sf_hubs/geneNetwork/hub.txt","/folders/sf_hubs/geneNetwork/geneNetwork.html", "Gene Network", "Gene Network Hub for Schistosoma mansoni", now(),2, "schMan2,");
Track hub configuration
- Create the contents of trackDb.txt (track without spaces or dots and with the firts character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
$> sudo cat > /usr/local/share/gbib/hubs/geneNetwork/schMan2/trackDb.txt << EOI track SMPs bigDataUrl schMan2.bb shortLabel SMPs v5.2 longLabel Schistosoma mansoni predictions (SMPs), version 5.2 type bigBed 12 searchIndex name visibility full html schMan2-description boxedCfg on color 96,64,0 altColor 128,64,32 dataVersion Dec. 2011 Sanger 5.2 # directUrl http://verjo-server-01.iq.usp.br/genome/pires/geneNetwork/schMan1/geneView/%s iframeUrl https://www.google.com.br/search?q=$$ iframeOptions height='400' width='640' scrolling='yes' priority 100 url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$ urlLabel NCBI Details: urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$" track roche454-blat bigDataUrl roche454-blat.bb shortLabel Roche 454 Trinity longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat type bigBed 12 searchIndex name visibility full color 64,0,96 altColor 64,32,128 EOI
- In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
$> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa
- If the names of the chromosomes are very long, we need to make them shorter:
$> sed s/Schisto_mansoni/Sm/ schMan2.fasta > schMan2-shortChromNames.fasta
- Get the .2bit file from this fasta:
$> faToTwoBit schMan2-shortChromNames.fasta schMan2.2bit
- Get and sort from the largest to the shortest a file with the size of all chromosomes of the genome of interest:
$> twoBitInfo schMan2.2bit stdout | sort -k2rn > schMan2-chromSizes-sorted.txt
- The same substitution have to be done at the bed file of the track:
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed
- The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed
- Convert from bed to bigBed:
$> bedToBigBed -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb
- Contents of groups.txt:
$> cat > /usr/local/src/gbib/hubs/geneNetwork/schMan2/groups.txt << EOI name custom label Custom priority 1 defaultIsClosed 1 name mapping label Mapping priority 2 defaultIsClosed 1 name genes label Genes priority 3 defaultIsClosed 1 name mrna label mRNA priority 4 defaultIsClosed 1 name regulation label Regulation priority 5 defaultIsClosed 1 name comparative label Comparative priority 6 defaultIsClosed 1 name variation label Variation priority 7 defaultIsClosed 1 name experimental label Experimental priority 8 defaultIsClosed 0 EOI
Blat configuration
- From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:
$> gfServer start 127.0.0.1 42422 -stepSize=5 schMan2.2bit & $> gfServer start 127.0.0.1 42423 -trans schMan2.2bit &
- If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &
- Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:
blat 127.0.0.1 42422 transBlat 127.0.0.1 42423
- Add this commands to cron.
Custom track configuration
track type=bigBed
GBiB maintenance
- Make an update of all softwares and data:
$> gbibOnline