Making a hub for a cell browser
This page will go over how to use the makeCbHub script to build a track hub from bigWig, bigBed, and other big* files provided by a submitter.
File organization
The script assumes a certain directory structure when it creates the trackDb stanzas.
In your dataset directory, create a ‘hub’ directory where all of the hub-related files will live. In that hub directory, you will then create a directory for each composite/parent track:
cb_dataset_dir/ |--> hub/ |--> track_set_A/ |--> track_A1.bw |--> track_A2.bw |--> etc… |--> track_set_B |--> track_B1.bw |--> etc…
Dividing the individual tracks into composite/parent tracks will vary from dataset to dataset. For example, in the collection mouse-brain-cutandtag, individual tracks were divided into a composite track for each dataset in the collection (e.g. h3k27ac, h3k27me3, h3k27me3-cell-lines, h3k36me3, h3k4me3, olig2, rad21) as this was what was requested by the authors. In neuro-degen-atac, individual tracks were grouped according to their corresponding metadata field (e.g. broad-celltypes, clusters, neuronal-celltypes, neuronal-clusters). If you’re not sure how to group the tracks ask Max and/or the contributors.
Finally, it’s best to make symlinks to the track files in the orig directory to prevent the unnecessary duplication of large amounts of files. See human-enhancer-atlas/hub and fetal-chromatin-atlas/hub as examples, where the bigWigs alone were 212 GB and 138 GB, respectively. (/hive has a ton of storage, but it's good to not waste space unnecessarily.)
Running the script
For makeCbHub, at the very least, all you need is a directory of big* files.
For example, use the commands below to generate the trackDb stanzas for a single composite track in the mouse-brain-cutandtag dataset:
cd /hive/data/inside/cells/datasets/mouse-brain-cutandtag/hub makeCbHub olig2 Output: track olig2 compositeTrack on shortLabel olig2 longLabel olig2 visibility dense autoScale group type bigWig track olig2_cluster_non_oligo parent olig2 on shortLabel cluster_non_oligo longLabel cluster_non_oligo type bigWig 0.000000 2358.294189 autoScale group bigDataUrl olig2/cluster_non_oligo.bw visibility dense track olig2_cluster_oligo parent olig2 on shortLabel cluster_oligo longLabel cluster_oligo type bigWig 0.000000 280.645325 autoScale group bigDataUrl olig2/cluster_oligo.bw visibility dense
As you can see, it works, though it wouldn't be particularly pretty to look at the Genome Browser. The labels are not very human-friendly and both tracks will be colored the same, default color, black.
Options to customize output
The six optional arguments for makeCbHub allow you greater control over what’s put into these trackDb stanzas, including shortLabels, longLabels, and colors.
Composite track labels
Normally, the directory names under the required argument fileDir are used as the labels for the composite/parent tracks in the trackDb. The option -d/–datasetList allows you specify the casing used for those labels.
makeCbHub -d “Rad21 Olig2” bw/
track rad21 compositeTrack on shortLabel Rad21 longLabel Rad21 visibility dense autoScale group type bigWig …
track olig2 compositeTrack on shortLabel Olig2 longLabel Olig2 visibility dense autoScale group type bigWig …
This command assumes that in bw/, there are two directories: rad21 and olig2, but it will use Rad21 and Olig2 as the shortLabel/longLabel for those composites in the trackDb.
Individual track labels
The -s/–shortLabel and -l/–longLabel options allow you to do something similar except for the individual tracks in the composites. By default the script uses the file names as the labels, which can be pretty messy:
track bw_P21208_1004_OPC_Ctr_RND1_peaks parent bw on shortLabel P21208_1004_OPC_Ctr_RND1_peaks longLabel P21208_1004_OPC_Ctr_RND1_peaks type bigWig 0.000000 160.000000 autoScale group bigDataUrl bw/P21208_1004_OPC_Ctr_RND1_peaks.bw visibility dense
However, the short and long label option allows one to control those:
track bw_P21208_1004_OPC_Ctr_RND1_peaks parent bw on shortLabel OPC_Ctr longLabel OPC_Ctr - Control oligodendrocyte precursor cells type bigWig 0.000000 160.000000 autoScale group bigDataUrl bw/P21208_1004_OPC_Ctr_RND1_peaks.bw visibility dense
The shortLabels file contains two columns: (1) file name, (2) short label: P21208_1005_MOL12_EAE_RND2_peaks.bw MOL12_EAE
The longLabels file is equivalent to the acronyms file that can be used in the cell browser, with (1) shortLabel, and (2) being the desired long label: COP Committed Oligodendrocyte precursor cells
Colors
Finally, the -c/–color option allows you to color each of the tracks. It is equivalent to the color file that can be used with the cell browser, meaning that column 1 is the short label and column 2 is the color likely in hexcode format, though RGB tuple is also acceptable:
Combine all of these settings together to get: