GBiB: From download to BLAT at assembly hubs: Difference between revisions

From genomewiki
Jump to navigationJump to search
m (Typo fix. Another estimation to update time.)
(→‎Track hub configuration: Changing from eboVir3 to tryCru32-CLBrenerEsmeraldoLike.)
 
(80 intermediate revisions by 2 users not shown)
Line 7: Line 7:
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.
* It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.


Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 15.04 (Vivid). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.
Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.
 
=== Downloading the raw data ===
 
* Create the directories that will store the assembly hub configuration files:
<font color=blue>user@host:$></font> mkdir -p ~/var/gbib/work/virusNetwork/eboVir3/genome ~/var/gbib/hubs/virusNetwork/eboVir3/genome
<font color=blue>user@host:$></font> cd ~/var/gbib/work/virusNetwork/eboVir3
<font color=blue>user@host:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/eboVir3.2bit .
<font color=blue>user@host:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/eboVir3.chrom.sizes .
<font color=blue>user@host:$></font> rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Ebola_virus/bigZips/KM034562v1.fa.gz .
<font color=blue>user@host:$></font> mv eboVir3.2bit genome
 


== GBiB installation ==
== GBiB installation ==
Line 24: Line 13:
* Create a folder at your machine to place the installation files:
* Create a folder at your machine to place the installation files:
  <font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib
  <font color=red>user@host:$></font> sudo mkdir /usr/local/src/gbib
<font color=red>user@host:$></font> sudo chmod o+x /usr/local/src/gbib
* Download GBiB from UCSC Genome Browser virtual store:
* Download GBiB from UCSC Genome Browser virtual store:
** Go to the [http://genome-store.ucsc.edu Genome Store].
** Go to the [http://genome-store.ucsc.edu Genome Store].
Line 36: Line 26:
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.
** Download GBiB to /usr/local/src/gbib, uncompress and delete it.
  <font color=blue>user@host:$></font> cd /usr/local/src/gbib
  <font color=blue>user@host:$></font> cd /usr/local/src/gbib
  <font color=red>user@host:$></font> sudo wget <download_link>
  <font color=red>user@host:$></font> sudo wget --no-check-certificate <download_link>
  <font color=red>user@host:$></font> sudo unzip gbib.zip
  <font color=red>user@host:$></font> sudo unzip gbib.zip
  <font color=red>user@host:$></font> sudo rm gbib.zip
  <font color=red>user@host:$></font> sudo rm gbib.zip
Line 47: Line 37:
* Add GBiB to VirtualBox and boot it for the first time:
* Add GBiB to VirtualBox and boot it for the first time:
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
** Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
** Wait while the first update is done (it can takes from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).
** Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).
** Close GBiB terminal window.
** Close GBiB terminal window.
** Select "Send the shutdown signal".
** Select "Send the shutdown signal".
** Confirm by clicking "OK".
** Confirm by clicking "OK".


== GBiB configuration ==
== GBiB configuration ==
Line 61: Line 50:
** Display ---> Video ---> Video Memory: 32 MB.
** Display ---> Video ---> Video Memory: 32 MB.
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.
** Shared Folders ---> + ---> Folder Path: ~/var/gbib/hubs ---> Read-only ---> Auto-mount ---> OK.
* Boot GBiB virtual machine:
* Boot GBiB virtual machine:
** Select "browserbox" on menu at left.
** Select "browserbox" on menu at left.
Line 87: Line 75:


* Log in again using ssh:
* Log in again using ssh:
  $> ssh browser@localhost -p 1235
  <font color=blue>user@host:$></font> ssh browser@localhost -p 1235
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
 
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.
 
** To disable this feature, click at "clear" on the message that appears at the top of the page.
=== Downloading the raw data ===
 
* Create the directories that will store the assembly hub configuration files:
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input
 
* Download input data file: genome FASTA.
<font color=green>browser@browserbox:$></font> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'
 
* Make a symbolic link for input data file (just to keep a pattern).
<font color=green>browser@browserbox:$></font> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta




=== Preparing the raw data ===
=== Creating a basic hub.txt file ===


Let's extract the fasta sequence from the .2bit file with the twoBitToFa command:
<font color=green>browser@browserbox:$></font> twoBitToFa eboVir3.2bit stdout > eboVir3.fa
We can also build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 eboVir3.fa eboVir3.agp
Check if the new AGP file matches the fasta file:
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n eboVir3.agp > eboVir3-sorted.agp
<font color=green>browser@browserbox:$></font> checkAgpAndFa eboVir3-sorted.agp eboVir3.2bit


* Fill the contents of hub.txt file:
* Fill the contents of hub.txt file:
  $> cat > /usr/local/src/gbib/hubs/virusNetwork/hub.txt << EOI
  <font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/hub.txt << EOI
  hub virusNetwork
  hub assemblyHub
  shortLabel Virus Network
  shortLabel Assembly Hub
  longLabel Virus Network Hub for Schistosoma mansoni
  longLabel Assembly Hub for Trypanosoma cruzi
  genomesFile genomes.txt
  genomesFile genomes.txt
  email admin@virus.edu
  email admin@assemblyhub.edu
  descriptionUrl description.html
  descriptionUrl http://genome.assemblyhub.edu
  EOI
  EOI
* The following rules must be obeyed:
* The following rules must be obeyed:
** hub: name without spaces.
** hub: name without spaces.
** shortLabel: limited to 17 characters.
** shortLabel: limited to 17 characters.
** longLabel: limited to 80 characters.
** longLabel: limited to 80 characters.
=== Creating a basic genomes.txt file ===
* Fill the contents of genomes.txt:
* Fill the contents of genomes.txt:
  $> cat > /usr/local/src/gbib/hubs/virusNetwork/genomes.txt << EOI
  <font color=green>browser@browserbox:$></font> cat > /folders/sf_work/assemblyHub/genomes.txt << EOI
  genome schMan2
  genome tryCru32-CLBrenerEsmeraldoLike
  trackDb schMan2/trackDb.txt
  organism Trypanosoma cruzi CL Brener Esmeraldo-like
  twoBitPath schMan2/schMan2.2bit
  scientificName Trypanosoma cruzi
  groups schMan2/groups.txt
  orderKey 1
  description Dec. 2011 (Sanger 5.2)
  description TriTrypDB Release 32 (20 Apr 2017)
organism Schistosoma mansoni
  defaultPos TcChr1-S:1-77,957
  defaultPos Sm.Chr_1.unplaced.SC_0010:312,104-379,754
  twoBitPath tryCru32-CLBrenerEsmeraldoLike/genome/final/tryCru32-CLBrenerEsmeraldoLike.2bit
  orderKey 2
  htmlPath tryCru32-CLBrenerEsmeraldoLike/htmlPage/description.html
  htmlPath schMan2/description.html
  groups tryCru32-CLBrenerEsmeraldoLike/groups.txt
  scientificName Schistosoma mansoni
trackDb tryCru32-CLBrenerEsmeraldoLike/trackDb.txt
  blat 127.0.0.1 42422
  blat 127.0.0.1 2302
  transBlat 127.0.0.1 42423
  transBlat 127.0.0.1 2303
  EOI
  EOI
* Create the HTML page description for the hub:
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki>
<HEAD><TITLE>Virus Network Hub</TITLE>
<BODY>
<P>
Ebola virus genome assembly and track hub.
<UL>
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank">
NCBI genome/4887 (Ebola virus)</A></LI>
</UL>
</P>
<BODY></HTML></nowiki>
EOI
* Include an image of the organism.
* Check if everything is OK with the hub:
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki>
You will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/trackDb.txt</nowiki>. We have to configure at least one track at our track hub in order to have a working assembly hub.


=== Preparing the data ===
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
<font color=blue>user@host:$></font> change_case --in_format fasta --outfile ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output/tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input/tryCru32-CLBrenerEsmeraldoLike.fasta
* If the names of the chromosomes are very long, we need to make them shorter:
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output
<font color=green>browser@browserbox:$></font> sed 's/^>Trypanosoma_cruzi/>Tc/' tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta > tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta
* Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:
<font color=green>browser@browserbox:$></font> ln -s tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta tryCru32-CLBrenerEsmeraldoLike.fasta
<font color=green>browser@browserbox:$></font> faToTwoBit tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.2bit
* Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:
<font color=green>browser@browserbox:$></font> twoBitInfo tryCru32-CLBrenerEsmeraldoLike.2bit stdout | sort -k2nr > tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt
* Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.
<font color=green>browser@browserbox:$></font> hgFakeAgp -minContigGap=1 tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.agp
* Check if the new AGP file matches the fasta file:
<font color=green>browser@browserbox:$></font> sort -k1,1 -k2n,2n tryCru32-CLBrenerEsmeraldoLike.agp > tryCru32-CLBrenerEsmeraldoLike-sorted.agp
<font color=green>browser@browserbox:$></font> checkAgpAndFa tryCru32-CLBrenerEsmeraldoLike-sorted.agp tryCru32-CLBrenerEsmeraldoLike.2bit


== Track hub configuration ==
== Track hub configuration ==
We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.


* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
* Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
  $> sudo cat > /usr/local/share/gbib/hubs/virusNetwork/schMan2/trackDb.txt << EOI
  <font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/tracks/map/assembly/{input,output}
track SMPs
  <font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/tracks/map/assembly/
bigDataUrl schMan2.bb
  <font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI
shortLabel SMPs v5.2
longLabel Schistosoma mansoni predictions (SMPs), version 5.2
type bigBed 12
group map
searchIndex name
visibility full
html schMan2-description
boxedCfg on
colorByStrand 150,100,30 230,170,40
  color 150,100,30
altColor 230,170,40
dataVersion Dec. 2011 <em>Sanger 5.2</em>
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s
  iframeUrl https://www.google.com.br/search?q=$$
iframeOptions height='400' width='640' scrolling='yes'
priority 100
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
urlLabel NCBI Details:
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"
track roche454-blat
bigDataUrl roche454-blat.bb
shortLabel Roche 454 Trinity
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
type bigBed 12
searchIndex name
visibility full
color 64,0,96
altColor 64,32,128
  track assembly
  track assembly
shortLabel Assembly
  longLabel Assembly
  longLabel Assembly
shortLabel Assembly
priority 10
visibility pack
colorByStrand 150,100,30 230,170,40
color 150,100,30
altColor 230,170,40
bigDataUrl eboVir3-assembly.bb
  type bigBed 6
  type bigBed 6
  html trackDescriptions/assembly
  bigDataUrl tracks/map/assembly/output/assembly.bb
  url http://www.ncbi.nlm.nih.gov/nuccore/$$
  urlLabel NCBI Nucleotide database
EOI
  group map
The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).
 
* Construction of the assembly track directly from the AGP file:
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/tryCru32-CLBrenerEsmeraldoLike-sorted.agp input/
<font color=green>browser@browserbox:$></font> grep -v "^#" input/tryCru32-CLBrenerEsmeraldoLike.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed
  <font color=green>browser@browserbox:$></font> ln -s ../../../../genome/tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt input/
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/tryCru32-CLBrenerEsmeraldoLike-chromSizes.txt output/assembly.bb
 
* Edit the main trackDb.txt file to include the assembly track configuration.
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI
  #==========================================
  # MAPPING AND SEQUENCING.
   
   
# Assembly.
include tracks/map/assembly/trackDb.txt
EOI
* Double check the integrity of your hub with the command hubCheck:
<font color=green>browser@browserbox:$></font> hubCheck /folders/sf_work/assemblyHub/hub.txt
<font color=green>browser@browserbox:$></font> hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/assemblyHub/hub.txt</nowiki>
== Blat configuration ==
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/genome
<font color=green>browser@browserbox:$></font> mkdir ../log/
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &
<font color=green>browser@browserbox:$></font> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):
blat 127.0.0.1 42420
transBlat 127.0.0.1 42421
== GBiB maintenance ==
* Make an update of all softwares and data:
$> gbibOnline
$> gbibAutoUpdateOn
$> updateBrowser
$> gbibAddTools
$> gbibAutoUpdateOff
$> gbibOffline
== Additional configuration ==
=== Gap track ===
It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.
<font color=green>browser@browserbox:$></font> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}
<font color=green>browser@browserbox:$></font> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-sorted.agp input/
<font color=green>browser@browserbox:$></font> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed
<font color=green>browser@browserbox:$></font> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb
<font color=green>browser@browserbox:$></font> cat > trackDb.txt << EOI
  track gap
  track gap
longLabel Gap
  shortLabel Gap
  shortLabel Gap
  priority 11
  longLabel Gap Locations
visibility dense
  type bigBed 4 .
color 0,0,0
  bigDataUrl tracks/map/gap/output/gap.bb
bigDataUrl eboVir3-gap.bb
  EOI
  type bigBed 4
  <font color=green>browser@browserbox:$></font> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI
  group map
html trackDescriptions/gap
track gc5Base
shortLabel GC Percent
longLabel GC Percent in 5-Base Windows
group map
priority 23.5
visibility full
autoScale Off
maxHeightPixels 128:36:16
graphTypeDefault Bar
  gridDefault OFF
windowingFunction Mean
  color 0,0,0
altColor 128,128,128
viewLimits 30:70
type bigWig 0 100
bigDataUrl eboVir3-gc5Base.bw
html trackDescriptions/gc5Base
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.
   
   
# Gap Locations.
include tracks/map/gap/trackDb.txt
EOI
=== Basic HTML page ===
Let's compose a basic page to our organism of interest:
<font color=green>browser@browserbox:$></font> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI<nowiki>
<p>
<i>Ebola</i> virus genome assembly and track hub.
<ul>
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank">
NCBI genome/4887 (Ebola virus)</a></li>
</ul>
</p>
<p>
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br>
</p></nowiki>
  EOI
  EOI
The name of each track ("track" field) must be unique at the entire file.
 
* Check again if everything is OK with the hub:
For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.
$> sudo ~browser/bin/hubCheck <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/hub.txt</nowiki>
 
Now you will see an error message stating that it was not possible to open the file <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/eboVir3/smpsWithoutUtrs.bb</nowiki>. We have to copy (link?) this file to the correct place.
= Other tips =
$> cp smpsWithoutUtrs.bb ~/var/gbib/work/virusNetwork/eboVir3
 
$> cd ~/var/gbib/hubs/virusNetwork/eboVir3
$> ln -s ../../../sf_work/virusNetwork/eboVir3/smpsWithoutUtrs.bb
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:
At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:


"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"
"Couldn't open <nowiki>http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html</nowiki>"


Let's compose a basic page to our organism of interest:
* Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
  $> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/description.html << EOI<nowiki>
** Insert "udcTimeout=1&" right after [http://genome.ucsc.edu/cgi-bin/hgTracks? http://genome.ucsc.edu/cgi-bin/hgTracks?] at URL.
** To disable this feature, click at "clear" on the message that appears at the top of the page.
* Create the HTML page description for the hub:
  $> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI<nowiki>
<HEAD><TITLE>Virus Network Hub</TITLE>
<BODY>
<P>
<P>
Ebola virus genome assembly and track hub.
Ebola virus genome assembly and track hub.
Line 256: Line 271:
</UL>
</UL>
</P>
</P>
<P>
<BODY></HTML></nowiki>
<B>UCSC Genome Browser assembly ID:</B> araTha1<BR>
Use as an example: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html
</P></nowiki>
  EOI
  EOI
* In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
* Include an image of the organism.
$> change_case --in_format fasta --outfile schMan2.fasta --processes 80 -a upper Schistosoma_mansoni_v5.2.fa
* If the names of the chromosomes are very long, we need to make them shorter:
$> sed s/Schisto_mansoni/Sm/ schMan2.fasta > schMan2-shortChromNames.fasta
* Get the .2bit file from this fasta:
$> faToTwoBit schMan2-shortChromNames.fasta schMan2.2bit
* Get and sort from the largest to the shortest a file with the size of all chromosomes of the genome of interest:
$> twoBitInfo schMan2.2bit stdout | sort -k2rn > schMan2-chromSizes-sorted.txt
* The same substitution have to be done at the bed file of the track:
* The same substitution have to be done at the bed file of the track:
  $> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed
  $> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed
Line 275: Line 281:
* Convert from bed to bigBed:
* Convert from bed to bigBed:
  $> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb
  $> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb
* Construction of the assembly and gap tracks directly from the AGP file:
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 != "N"' | awk '{printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > eboVir3-assemlby.bed
<font color=green>browser@browserbox:$></font> grep -v "^#" eboVir3.agp | awk '$5 == "N"' | awk '{printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > eboVir3-gap.bed
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-assembly.bed eboVir3-chromSizes.txt eboVir3-assembly.bb
<font color=green>browser@browserbox:$></font> bedToBigBed -verbose=0 eboVir3-gap.bed eboVir3-chromSizes eboVir3-gap.bb
* Construction of the GC content track:
* Construction of the GC content track:
  <font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \
  <font color=green>browser@browserbox:$></font> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \
Line 328: Line 329:
  EOI
  EOI
* Let's compose an HTML page to our track:
* Let's compose an HTML page to our track:
  $> cat > ~/var/gbib/hubs/virusNetwork/eboVir3/track-description.html << EOI<nowiki>
  $> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI<nowiki>
<H2>Description</H2>
<H2>Description</H2>
<P>
<P>
Line 339: Line 340:
used to generate and analyze the data.
used to generate and analyze the data.


<H2>Verification</H2>
<H2>Verification (Validation)</H2>
<P>
<P>
Replace this text with a description of the methods
Replace this text with a description of the methods
used to verify the data.
used to verify (validate) the data.


<H2>Credits</H2>
<H2>Credits</H2>
Line 355: Line 356:
references and/or websites that provide background
references and/or websites that provide background
or supporting information about the data.</nowiki>
or supporting information about the data.</nowiki>
Other interesting information: background information, display conventions, and acknowledgments.
   
   
  EOI
  EOI


 
visibility full
== Blat configuration ==
html schMan2-description
 
boxedCfg on
* From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:
colorByStrand 150,100,30 230,170,40
  $> gfServer start 127.0.0.1 42422 -stepSize=5 -log=/var/log/gfServer.eboVir3.log eboVir3.2bit &
color 150,100,30
  $> gfServer start 127.0.0.1 42423 -trans -log=/var/log/gfServer.eboVir3-trans.log eboVir3.2bit &
altColor 230,170,40
dataVersion Dec. 2011 <em>Sanger 5.2</em>
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s
  iframeUrl https://www.google.com.br/search?q=$$
iframeOptions height='400' width='640' scrolling='yes'
priority 100
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
  urlLabel NCBI Details:
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"
track roche454-blat
bigDataUrl roche454-blat.bb
shortLabel Roche 454 Trinity
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
type bigBed 12
searchIndex name
visibility full
color 64,0,96
altColor 64,32,128
track assembly
longLabel Assembly
shortLabel Assembly
priority 10
visibility pack
colorByStrand 150,100,30 230,170,40
color 150,100,30
altColor 230,170,40
bigDataUrl eboVir3-assembly.bb
type bigBed 6
html trackDescriptions/assembly
url http://www.ncbi.nlm.nih.gov/nuccore/$$
urlLabel NCBI Nucleotide database
group map
track gap
longLabel Gap
shortLabel Gap
priority 11
visibility dense
color 0,0,0
bigDataUrl eboVir3-gap.bb
type bigBed 4
group map
html trackDescriptions/gap
track gc5Base
shortLabel GC Percent
longLabel GC Percent in 5-Base Windows
group map
priority 23.5
visibility full
autoScale Off
maxHeightPixels 128:36:16
graphTypeDefault Bar
gridDefault OFF
windowingFunction Mean
color 0,0,0
altColor 128,128,128
viewLimits 30:70
type bigWig 0 100
bigDataUrl eboVir3-gc5Base.bw
html trackDescriptions/gc5Base
# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.
EOI
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":
* If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":
  $> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &
  $> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &
* Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat:
blat 127.0.0.1 42422
transBlat 127.0.0.1 42423
* Add this commands to cron, writing them just before the "exit" command at last line:
* Add this commands to cron, writing them just before the "exit" command at last line:
  $> sudo su -
  $> sudo su -
Line 377: Line 443:
  @vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &
  @vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &
  @vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &
  @vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &
To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).
Additional configuration to hub.txt:
descriptionUrl description


Additional configuration to genomes.txt:
groups eboVir3/groups.txt
description Ebola virus version 3
orderKey 1
htmlPath eboVir3/description
scientificName Ebola


== Custom track configuration ==
=== Custom track configuration ===


  browser position chr22:20,100,000-20,100,900
  browser position chr22:20,100,000-20,100,900
Line 390: Line 466:
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.
** group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.


=== Setting locale ===
Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".


== GBiB maintenance ==
Put the following lines at the bottom of ~browser/.bashrc at your GBiB:
 
# Define custom locale settings.
export LANG="C"
export LANGUAGE="C"
export LC_MESSAGES="C"
export LC_CTYPE="C"
export LC_NUMERIC="C"
export LC_TIME="C"
export LC_COLLATE="C"
export LC_MONETARY="C"
export LC_PAPER="C"
export LC_NAME="C"
export LC_ADDRESS="C"
export LC_TELEPHONE="C"
export LC_MEASUREMENT="C"
export LC_IDENTIFICATION="C"
export LC_ALL="C"


* Make an update of all softwares and data:
After that, load .bashrc again by doing:
$> gbibOnline
$> gbibAutoUpdateOn
$> updateBrowser
$> gbibAutoUpdateOff
$> gbibOffline


$> . ~browser/.bashrc


== References ==
== References ==

Latest revision as of 19:38, 25 June 2021

Introduction

Genome Browser in a Box (GBiB) has some obvious advantages when compared to other options we have while working with genomic data:

  • It is a genome browser with a lot of features and tools that do not exist on other available genome browsers.
  • It is much easier to install, configure and maintain when compared with a full mirror of UCSC Genome Browser web site.
  • It is a safe way to keep your private data inaccessible to unauthorized users while still collaborating with authorized personal.

Nonetheless, even after choosing GBiB as your genome browser, there is a lot of different choices to do. This wiki page explains how to install, configure and maintain an assembly hub (with a track hub and BLAT) using GBiB on a laptop running Kubuntu 19.10 (Eoan Ermine). Most commands are the same for other GNU/Linux distributions, with the differences probably being only relative to package names and crontab settings. Other architectures should involve the installation of GBiB on a server using only text interface and with specific ports enabled on the firewall to restrict the use of the data for just your network.

GBiB installation

  • Create a folder at your machine to place the installation files:
user@host:$> sudo mkdir /usr/local/src/gbib
user@host:$> sudo chmod o+x /usr/local/src/gbib
  • Download GBiB from UCSC Genome Browser virtual store:
    • Go to the Genome Store.
    • Click in "Login / Register".
    • Check if you agree with the terms and conditions at the box relative to GBiB.
    • Check if your hardware and software meet the basic requirements.
    • Click in "Add to cart".
    • Click in "Cart (1)" on menu.
    • Click in "Proceed to checkout".
    • Click in "My products" on menu.
    • Copy the address of download (let's call it <download_link>).
    • Download GBiB to /usr/local/src/gbib, uncompress and delete it.
user@host:$> cd /usr/local/src/gbib
user@host:$> sudo wget --no-check-certificate <download_link>
user@host:$> sudo unzip gbib.zip
user@host:$> sudo rm gbib.zip
  • Give user sufficient access to the three uncompressed files and to the folder:
user@host:$> sudo chmod o+rw /usr/local/src/gbib/*
user@host:$> sudo chmod o+w /usr/local/src/gbib
  • Install VirtualBox and start it in background:
user@host:$> sudo apt-get install virtualbox
user@host:$> virtualbox &
  • Add GBiB to VirtualBox and boot it for the first time:
    • Machine ---> Add ---> /usr/local/src/gbib/browserbox.vbox ---> Start
    • Wait while the first update is done (it can take from 5 minutes to more than 1 hour to finish the update process, depending of your internet connection speed).
    • Close GBiB terminal window.
    • Select "Send the shutdown signal".
    • Confirm by clicking "OK".

GBiB configuration

  • Click at "Settings".
    • General ---> Description: Ebola virus genome assembly and track hubs.
    • System ---> Motherboard ---> Base Memory: 4.096 MB.
    • System ---> Processor ---> Processor(s): 2.
    • Display ---> Video ---> Video Memory: 32 MB.
    • Shared Folders ---> + ---> Folder Path: ~/var/gbib/work ---> Auto-mount ---> OK.
  • Boot GBiB virtual machine:
    • Select "browserbox" on menu at left.
    • Click at "Start".
  • Test if everything is working at the following URLs:
  • Login using ssh, for a faster access.
    • Open a terminal, like "konsole".
    • Password: browser
user@host:$> ssh browser@localhost -p 1235
  • Install tools that allows file manipulations:
browser@browserbox:$> gbibAddTools
  • Turn off every kind of automatic update:
browser@browserbox:$> gbibAutoUpdateOff
  • Do not allow users to mirror tracks:
browser@browserbox:$> gbibMirrorTracksOff
  • Turn on the offline mode:
browser@browserbox:$> gbibOffline
  • Reboot the virtual machine
browser@browserbox:$> sudo shutdown -r now


Assembly hub configuration

  • Log in again using ssh:
user@host:$> ssh browser@localhost -p 1235


Downloading the raw data

  • Create the directories that will store the assembly hub configuration files:
browser@browserbox:$> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/{final,input,output}
browser@browserbox:$> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input
  • Download input data file: genome FASTA.
browser@browserbox:$> wget 'https://tritrypdb.org/common/downloads/release-32/TcruziCLBrenerEsmeraldo-like/fasta/data/TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta'
  • Make a symbolic link for input data file (just to keep a pattern).
browser@browserbox:$> ln -s TriTrypDB-32_TcruziCLBrenerEsmeraldo-like_Genome.fasta tryCru32-CLBrenerEsmeraldoLike.fasta


Creating a basic hub.txt file

  • Fill the contents of hub.txt file:
browser@browserbox:$> cat > /folders/sf_work/assemblyHub/hub.txt << EOI
hub assemblyHub
shortLabel Assembly Hub
longLabel Assembly Hub for Trypanosoma cruzi
genomesFile genomes.txt
email admin@assemblyhub.edu
descriptionUrl http://genome.assemblyhub.edu
EOI
  • The following rules must be obeyed:
    • hub: name without spaces.
    • shortLabel: limited to 17 characters.
    • longLabel: limited to 80 characters.


Creating a basic genomes.txt file

  • Fill the contents of genomes.txt:
browser@browserbox:$> cat > /folders/sf_work/assemblyHub/genomes.txt << EOI
genome tryCru32-CLBrenerEsmeraldoLike
organism Trypanosoma cruzi CL Brener Esmeraldo-like
scientificName Trypanosoma cruzi
orderKey 1
description TriTrypDB Release 32 (20 Apr 2017)
defaultPos TcChr1-S:1-77,957
twoBitPath tryCru32-CLBrenerEsmeraldoLike/genome/final/tryCru32-CLBrenerEsmeraldoLike.2bit
htmlPath tryCru32-CLBrenerEsmeraldoLike/htmlPage/description.html
groups tryCru32-CLBrenerEsmeraldoLike/groups.txt
trackDb tryCru32-CLBrenerEsmeraldoLike/trackDb.txt
blat 127.0.0.1 2302
transBlat 127.0.0.1 2303
EOI

Preparing the data

  • In the case that the fasta file is written with all nucleotides in lowercase, convert all the uppercase letters such that the genome do not be considered as if it was all masked. We can use the change_case command, from seq_crumbs:
user@host:$> change_case --in_format fasta --outfile ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output/tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta --processes 4 -a upper ~/var/gbib/work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/input/tryCru32-CLBrenerEsmeraldoLike.fasta
  • If the names of the chromosomes are very long, we need to make them shorter:
browser@browserbox:$> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/genome/output
browser@browserbox:$> sed 's/^>Trypanosoma_cruzi/>Tc/' tryCru32-CLBrenerEsmeraldoLike-uppercase.fasta > tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta
  • Make a symbolic link (just to shorten the name) and get the .2bit file from the FASTA file:
browser@browserbox:$> ln -s tryCru32-CLBrenerEsmeraldoLike-uppercase-shortChromNames.fasta tryCru32-CLBrenerEsmeraldoLike.fasta
browser@browserbox:$> faToTwoBit tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.2bit
  • Get and sort from the largest to the shortest a file with the name and the size of all chromosomes of the genome of interest:
browser@browserbox:$> twoBitInfo tryCru32-CLBrenerEsmeraldoLike.2bit stdout | sort -k2nr > tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt
  • Build an AGP file from the fasta file, marking all N's as gaps, using the hgFakeAgp command.
browser@browserbox:$> hgFakeAgp -minContigGap=1 tryCru32-CLBrenerEsmeraldoLike.fasta tryCru32-CLBrenerEsmeraldoLike.agp
  • Check if the new AGP file matches the fasta file:
browser@browserbox:$> sort -k1,1 -k2n,2n tryCru32-CLBrenerEsmeraldoLike.agp > tryCru32-CLBrenerEsmeraldoLike-sorted.agp
browser@browserbox:$> checkAgpAndFa tryCru32-CLBrenerEsmeraldoLike-sorted.agp tryCru32-CLBrenerEsmeraldoLike.2bit

Track hub configuration

We have to configure at least one track at our track hub in order to have a working assembly hub. The first track that we will configure is the assembly track, which shows a block for every range that contains nucleotides different of 'N'. In other words, the gaps are the holes in the track.

  • Create the contents of trackDb.txt (track without spaces or dots and with the first character as a letter, shortLabel <= 17 chars, longLabel <= 80 chars):
browser@browserbox:$> mkdir -p /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/tracks/map/assembly/{input,output}
browser@browserbox:$> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/tracks/map/assembly/
browser@browserbox:$> cat > trackDb.txt << EOI
track assembly
shortLabel Assembly
longLabel Assembly
type bigBed 6
bigDataUrl tracks/map/assembly/output/assembly.bb

EOI

The name of each track ("track" field) must be unique at the entire file and should not contain the character '.' (dot).

  • Construction of the assembly track directly from the AGP file:
browser@browserbox:$> ln -s ../../../../genome/tryCru32-CLBrenerEsmeraldoLike-sorted.agp input/
browser@browserbox:$> grep -v "^#" input/tryCru32-CLBrenerEsmeraldoLike.agp | awk '{if ($5 != "N") printf "%s\t%d\t%d\t%s\t0\t%s\n", $1, $2, $3, $6, $9}' | sort -k1,1 -k2,2n > output/assembly.bed
browser@browserbox:$> ln -s ../../../../genome/tryCru32-CLBrenerEsmeraldoLike-chromSizes-sorted.txt input/
browser@browserbox:$> bedToBigBed -verbose=1 -type=bed6 -tab output/assembly.bed input/tryCru32-CLBrenerEsmeraldoLike-chromSizes.txt output/assembly.bb
  • Edit the main trackDb.txt file to include the assembly track configuration.
browser@browserbox:$> cd /folders/sf_work/assemblyHub/tryCru32-CLBrenerEsmeraldoLike/
browser@browserbox:$> cat > trackDb.txt << EOI
#==========================================
# MAPPING AND SEQUENCING.

# Assembly.
include tracks/map/assembly/trackDb.txt
EOI
  • Double check the integrity of your hub with the command hubCheck:
browser@browserbox:$> hubCheck /folders/sf_work/assemblyHub/hub.txt
browser@browserbox:$> hubCheck http://127.0.0.1:1234/folders/sf_hubs/assemblyHub/hub.txt

Blat configuration

  • From the folder that contains the .2bit file, start two gfServer's, specifying the assembly hub ports that will be used to access the DNA sequence and the aminoacids sequence:
browser@browserbox:$> cd /folders/sf_work/virusNetwork/eboVir3/genome
browser@browserbox:$> mkdir ../log/
browser@browserbox:$> ~browser/bin/blat/gfServer start 127.0.0.1 42420 -stepSize=5 -log=../log/gfServer.log eboVir3.2bit &
browser@browserbox:$> ~browser/bin/blat/gfServer start 127.0.0.1 42421 -trans -log=../log/gfServer-trans.log eboVir3.2bit &
  • Edit the file genomes.txt of the assembly hub in order to include the lines relatives to blat and transBlat (pay attention to the capital 'B' at "transBlat"):
blat 127.0.0.1 42420
transBlat 127.0.0.1 42421

GBiB maintenance

  • Make an update of all softwares and data:
$> gbibOnline
$> gbibAutoUpdateOn
$> updateBrowser
$> gbibAddTools
$> gbibAutoUpdateOff
$> gbibOffline

Additional configuration

Gap track

It is easy to build a gap track directly from the AGP file. The gap track evidences the genome loci that contains N's, being a kind of complementary track of the assembly track.

browser@browserbox:$> mkdir -p /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/{input,output}
browser@browserbox:$> cd /folders/sf_work/virusNetwork/eboVir3/tracks/map/gap/
browser@browserbox:$> ln -s ../../../../genome/eboVir3-sorted.agp input/
browser@browserbox:$> grep -v "^#" input/eboVir3-sorted.agp | awk '{ if ($5 == "N") printf "%s\t%d\t%d\t%s\n", $1, $2, $3, $8}' | sort -k1,1 -k2,2n > output/gap.bed
browser@browserbox:$> ln -s ../../../../genome/eboVir3-chromSizes-sorted.txt input/
browser@browserbox:$> bedToBigBed -verbose=1 -type=bed4 -tab output/gap.bed input/eboVir3-chromSizes-sorted.txt output/gap.bb
browser@browserbox:$> cat > trackDb.txt << EOI
track gap
shortLabel Gap
longLabel Gap Locations
type bigBed 4 .
bigDataUrl tracks/map/gap/output/gap.bb
EOI
browser@browserbox:$> cat >> /folders/sf_work/virusNetwork/eboVir3/trackDb.txt << EOI

# Gap Locations.
include tracks/map/gap/trackDb.txt
EOI


Basic HTML page

Let's compose a basic page to our organism of interest:

browser@browserbox:$> cat > /folders/sf_work/virusNetwork/eboVir3/description.html << EOI
<p>
<i>Ebola</i> virus genome assembly and track hub.
<ul>
<li><a HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank">
NCBI genome/4887 (Ebola virus)</a></li>
</ul>
</p>
<p>
<b>UCSC Genome Browser assembly ID:</b> eboVir3<br>
</p>
EOI

For a more complete example, see the code of the following page: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/araTha1/description.html.

Other tips

At last the hubCheck command will run without any error being identified. But, if you point your browser to http://127.0.0.1:1234 and go to: Genomes ---> group = Virus Network, you will see the following error:

"Couldn't open http://127.0.0.1:1234/folders/sf_hubs/virusNetwork/schMan2/description.html"

  • Forcing configuration files to be loaded again every time that the page is reloaded (instead of after at least 300 seconds):
  • Create the HTML page description for the hub:
$> cat > ~/var/gbib/hubs/virusNetwork/description.html << EOI
<HEAD><TITLE>Virus Network Hub</TITLE>
<BODY>
<P>
Ebola virus genome assembly and track hub.
<UL>
<LI><A HREF="http://www.ncbi.nlm.nih.gov/genome/4887" TARGET="_blank">
NCBI genome/4887 (Ebola virus)</A></LI>
</UL>
</P>
<BODY></HTML>

EOI
  • Include an image of the organism.
  • The same substitution have to be done at the bed file of the track:
$> sed s/Schisto_mansoni/Sm/ smps.bed > smps-shortChromNames.bed
  • The bed file of the track have to be sorted first by the name of the chromosome and after by the starting coordinate:
$> sort -k1,1 -k2,2n smps-shortChromNames.bed > smps-shortChromNames-sorted.bed
  • Convert from bed to bigBed:
$> bedToBigBed -verbose=1 -type=bed12 -tab -extraIndex=name smps-shortChromNames-sorted.bed schMan2-chromSizes-sorted.txt smps.bb
  • Construction of the GC content track:
browser@browserbox:$> hgGcPercent -wigOut -doGaps -file=stdout -win=5 -verbose=0 eboVir3 \
                      eboVir3.2bit | gzip -c > eboVir3-gc5Base-wigVarStep.gz
browser@browserbox:$> wigToBigWig eboVir3-gc5Base-wigVarStep.gz eboVir3-chromSizes.txt eboVir3-gc5Base.bw
  • Contents of groups.txt:
$> cat > /usr/local/src/gbib/hubs/virusNetwork/schMan2/groups.txt << EOI
name user
label Custom
priority 1
defaultIsClosed 1

name map
label Mapping
priority 2
defaultIsClosed 0

name genes
label Genes
priority 3
defaultIsClosed 0

name mrna
label mRNA
priority 4
defaultIsClosed 1

name regulation
label Regulation
priority 5
defaultIsClosed 1

name comparative
label Comparative
priority 6
defaultIsClosed 1

name varRep
label Variation
priority 7
defaultIsClosed 0

name x
label Experimental
priority 8
defaultIsClosed 1

EOI
  • Let's compose an HTML page to our track:
$> cat > ~/var/gbib/work/virusNetwork/eboVir3/assembly.html << EOI
<H2>Description</H2>
<P>
Replace this text with a summary describing the
concepts or analysis represented by your data.

<H2>Methods</H2>
<P>
Replace this text with a description of the methods
used to generate and analyze the data.

<H2>Verification (Validation)</H2>
<P>
Replace this text with a description of the methods
used to verify (validate) the data.

<H2>Credits</H2>
<P>
Replace this text with a list of the individuals 
and/or organizations who contributed to the collection
and analysis of the data.

<H2>References</H2>
<P>
Replace this text with a list of relevant literature
references and/or websites that provide background
or supporting information about the data.

Other interesting information: background information, display conventions, and acknowledgments.

EOI
visibility full
html schMan2-description
boxedCfg on
colorByStrand 150,100,30 230,170,40
color 150,100,30
altColor 230,170,40
dataVersion Dec. 2011 Sanger 5.2
# directUrl http://verjo-server-01.iq.usp.br/genome/pires/virusNetwork/schMan1/geneView/%s
iframeUrl https://www.google.com.br/search?q=$$
iframeOptions height='400' width='640' scrolling='yes'
priority 100
url http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?form=4&db=n&term=$$
urlLabel NCBI Details:
urls pmid="http://www.ncbi.nlm.nih.gov/pubmed/$$" spId="http://www.uniprot.org/uniprot/$$"

track roche454-blat
bigDataUrl roche454-blat.bb
shortLabel Roche 454 Trinity
longLabel Schistosoma mansoni RNA-Seq Roche 454 Trinity contigs mapped by Blat
type bigBed 12
searchIndex name
visibility full
color 64,0,96
altColor 64,32,128

track assembly
longLabel Assembly
shortLabel Assembly
priority 10
visibility pack
colorByStrand 150,100,30 230,170,40
color 150,100,30
altColor 230,170,40
bigDataUrl eboVir3-assembly.bb
type bigBed 6
html trackDescriptions/assembly
url http://www.ncbi.nlm.nih.gov/nuccore/$$
urlLabel NCBI Nucleotide database
group map

track gap
longLabel Gap
shortLabel Gap
priority 11
visibility dense
color 0,0,0
bigDataUrl eboVir3-gap.bb
type bigBed 4
group map
html trackDescriptions/gap

track gc5Base
shortLabel GC Percent
longLabel GC Percent in 5-Base Windows
group map
priority 23.5
visibility full
autoScale Off
maxHeightPixels 128:36:16
graphTypeDefault Bar
gridDefault OFF
windowingFunction Mean
color 0,0,0
altColor 128,128,128
viewLimits 30:70
type bigWig 0 100
bigDataUrl eboVir3-gc5Base.bw
html trackDescriptions/gc5Base

# For bigWig data, we can use the new trackDb setting, negateValues on, to allow display on the Crick strand.

EOI
  • If the fasta file that was used to create the .2bit file was masked (i.e., it had aminoacids with lowercase letters), we can use the gfServer flag "-mask":
$> gfServer start 127.0.0.1 42423 -trans -mask schMan2.2bit &
  • Add this commands to cron, writing them just before the "exit" command at last line:
$> sudo su -
$> vim /etc/rc.local
@vim $>
@vim $> # Blat and transBlat daemons running against Ebola virus genome at ports 42422 and 42423, respectively.
@vim $> cd /folders/sf_hubs/virusNetwork/eboVir3
@vim $> ~browser/bin/blat/gfServer start localhost 42422 -stepSize=5 -log=/var/log/gfserver.eboVir3.log eboVir3.2bit &
@vim $> ~browser/bin/blat/gfServer start localhost 42423 -trans -log=/var/log/gfserver.eboVir3-trans.log eboVir3.2bit &

To add a track hub by directly adding the hub's URL to the browser URL. If you add hubUrl=[URL] to your hgTracks URL line, it will add the hub directly into the browser (e.g. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_47033_schMan1&hubUrl=http://www.vision.ime.usp.br/~davidsp/hub/hub.txt).

Additional configuration to hub.txt:

descriptionUrl description

Additional configuration to genomes.txt:

groups eboVir3/groups.txt
description Ebola virus version 3
orderKey 1
htmlPath eboVir3/description
scientificName Ebola

Custom track configuration

browser position chr22:20,100,000-20,100,900
browser hide all
track name="Track label" description="Chromossomes coordinates list" type=bigBed visibility=full color=200,50,50 itemRgb=On colorByStrand=0,0,50 0,50,0 useScore=1 altColor=100,200,200 group=x priority=1 db=eboVir3 url="http://verjo-server-01.iq.usp.br/genome/pires/smps.html#$$ htmlUrl="http://verjo-server-01.iq.usp.br/genome/pires/track-description.html" bigDataUrl=http://verjo-server-01.iq.usp.br/genome/pires/Schistosoma_mansoni_v5.2.gff.bed.bb
  • The following rules must be obeyed:
    • name: can consist of up to 15 characters, and must be enclosed in quotes if the text contains spaces.
    • description: can consist of up to 60 characters, and must be enclosed in quotes if the text contains spaces.
    • visibility: values include: 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish.
    • group: values include: custom, mapping, genes, mrna, regulation, comparative, variation, and x.

Setting locale

Since Kent's tools (like bedToBigBed) expect to find a system that provides a case-sensitive sort, we have to set the environment variables relative to locale to the value "C".

Put the following lines at the bottom of ~browser/.bashrc at your GBiB:

# Define custom locale settings.
export LANG="C" 
export LANGUAGE="C" 
export LC_MESSAGES="C" 
export LC_CTYPE="C" 
export LC_NUMERIC="C" 
export LC_TIME="C" 
export LC_COLLATE="C" 
export LC_MONETARY="C" 
export LC_PAPER="C" 
export LC_NAME="C" 
export LC_ADDRESS="C" 
export LC_TELEPHONE="C" 
export LC_MEASUREMENT="C" 
export LC_IDENTIFICATION="C" 
export LC_ALL="C"

After that, load .bashrc again by doing:

$> . ~browser/.bashrc

References

See also: