Adding New Tracks to a browser installation

From genomewiki
Jump to navigationJump to search

Adding a custom track

See Also

For a generic discussion of how to load any type of track into your local browser mirror, refer to the document in the source tree src/product/README.trackDb and the discussion of how the trackDb entries function in src/hg/makeDb/trackDb/README

Gotchas

Your .bed file better not have any browser or track lines if you want the loader to work

You need to hop all over the source tree like a frog on a griddle to get things done...

The docs are a little misleading perhaps. I seemed to need a .hg.conf file in my home directory - setting env options didn't seem to help a bit for commandline building access.


Instructions for most mirrors

The way UCSC provide to add tracks is problematic for mirrors, is not very straightforward for users, and quickly becomes a problem when people load a large number of tracks, as we (JSM group) have found out in the past. So you will not be able to do it this way.

Instead, we have made a simpler method to load, group, and manage tracks.

It is controlled by a script in your path, and from files in the 'ucsc' directory in your home dir, ($HOME/ucsc/) From your shell prompt, just type:

ucscMakeTracks.pl <enter>

to see the usage of the command. The short answer is:-


1. load your track from the command line using for exaple, hgLoadBed (run the command with no parameters to see the usage message) For example:

hgLoadBed hg18  mjp_TrackName my_track_bed_file.bed

This will load the BED file: my_track_bed_file.bed into the mysql database hg18, table name: mjp_MyTrackName.

As a convention we always make the track name start with our initials, some users have hundreds of custom tracks, and this is the way we use to figure out whats what.


2. Make an entry for your track in the $HOME/ucsc/my_tracks.ra file. You can see examples of .ra files in various directories under the UCSC directory tree under /common/ucsc_mirror_src/kent/src/hg/makeDb/trackDb/, for example,

/common/ucsc_mirror_src/kent/src/hg/makeDb/trackDb/human/trackDb.ra
/common/ucsc_mirror_src/kent/src/hg/makeDb/trackDb/trackDb.ra

As an example, I might put:

track mjp_TrackName shortLabel my miRNAs longLabel my miRNAs in cancer group m.pheasant priority 1.003 visibility hide Color 153,50,0 type bed 6

This links the mysql table to what is displayed on the browser. Points to note:

group = what 'track group' should this appear under in the html page below the browser image, where the display controls are. It is the big blue horizontal line separating groups of tracks. Usually you would put your unix user id here to put all your tracks in a group of your own (m.pheasant in my case). You can create new groups with different names by creating their .ra definition in a file in the $HOME/ucsc/track_groups/ directory (this is a link to a common shared writable directory on your VM). You make one file per group, and the file name incorporates the group priority (order), group name, and group description which will go on the html page. Eg. the file /home/m.pheasant/ucsc/track_groups/1.020+riken+RIKEN Data.ra is a group 'riken' (this is what you enter in the group field in the .ra file) which will appear under the name 'RIKEN Data' on the HTML page.

track = name of the table

shortLabel = what is displayed on the left of the track

longLabel = what is displayed above the track

priority = the order the tracks should be listed on the html page.

visibility = should this track be displayed by default or not (dense/hide/pack/+ some other options)


3. then you run the ucscMakeTracks.pl script with '-b' option which will update the appropriate tables and make your track appear. This track definition will be applied to all genomes where there are tables matching the name in the .ra file.

ucscMakeTracks.pl -b


4. Go to the browser and you will now see your track group & track. Any genome databases which have a table listed in your .ra file will show the group and track. The group will not show up when you browse a genome database which has no table loaded for that group.


5. If you want to see other tracks for other users, or other group tracks, you can put the group names in the $HOME/ucsc/groups.conf file. If this file is empty (default) then you will see the standard UCSC group tracks plus the group of your user id. If you put entries in this file, you will need to also put your own user id.

eg, in $HOME/ucsc/groups.conf, I would put the entries:

m.pheasant r.taft riken

To see my tracks, the riken tracks, and ryan's tracks, as well as the standard UCSC ones.


6. Obviously, you will only be able to see custom tracks loaded on your own VM. You will not see any of our IMB tracks I listed above.




Instructions for the older mirrors

From the readme on addTrack.txt

I. Introduction.

This describes how to add a track to the UCSC genome browser. The two major steps in adding a track are creating a table containing the track information, and putting a description of the track in trackDd. The browser has one mysql database for each version of each genome that it displays. Both the track table and the track description live in this database. The current human genome database is hg13, while the current mouse database is mm2.


II. MySQL Preliminaries.

Before you get started it is good to look at these databases a little, and make sure that you have update access to them. You could do this directly with the 'mysql' command, but let's do it instead with the 'hgsql' command, which will keep you from having to type your user name and password all the time.

Assuming you've got the browser source already installed in ~/kent/src do the following to create hgsql cd ~/kent/src/lib make This make almost always goes smoothly on Linux. You may need to remove the '-ggdb' flag in the makefile on other systems, and possibly set up a MACHTYPE environment variable, and then mkdir $MACHTYPE on some systems. Next cd ../hg/lib make The main problem that can happen with this make is if the mysql libraries and include files are not found. See kent/src/README for details. The next step is cd ../hgsql make rehash The hgsql program is just a thin wrapper around mysql. It looks for the password and username in the file ~/.hg.conf. Here's the necessary parts of .hg.conf:

  1. db.host is the name of the MySQL host to connect to

db.host=localhost

  1. db.user is the username is use when connecting to the host

db.user=mysqlUserName

  1. this is the password to use with the above hostname

db.password=mysqlPassword

This .hg.conf is similar to the cgi-bin/hg.conf file that the browser uses, but it need not contain everything that file does. Also it's advisable to have a read-only user/password in cgi-bin/hg.conf while you'll want a read-write user/password in ~/.hg.conf. Setting this up can involve doing some 'grants' in mysql. See the documentation at www.mysql.com for how do set up various users.

Assuming your mysql and .hg.conf are set up, and that you already have a mirror site going then the command hgsql database (where database is something like hg13 or mm2) should bring you to the mysql prompt. Do mysql> show tables; and you should see a large list of tables. When you've finished adding a track, the track(s) for your tables will be among them. Also try doing: mysql> describe trackDb; This will list the fields of the trackDb table, which has a row for each track. Then do mysql> select tableName,shortLabel,type from trackDb; This will show you some of the key fields from this table. You won't be updating this table directly, but it can be handy to look at it sometimes for debugging purposes. Some other useful mysql commands are

       mysql> select count(*) from sex;

This will count up all the items in the sex table. You thought there were only two? This table reflects the diversity of the sex fields in genbank. Try mysql> select * from sex; to see the full diversity. Well, enough of that non-normalized nightmare. To get out do:

       mysql> quit


III. Loading the main track table.

The UCSC Genome Browser Database is usually loaded from a text file of some sort. The most popular types of text files are .psl files for blat alignments, .bed files for a wide variety of data, and GTF files for gene predictions. See http://genome.ucsc.edu/goldenPath/help/customTrack.html for further information on these formats. For now we'll assume you have a file in one of these formats that you want to add to the browser.

A) Creating the loader programs cd ~/kent/src/hg/makeDb cd hgLoadPsl make cd ../hgLoadBed make cd ../ldHgGene make cd ../hgPepPred

  The makeDb directory also contains loaders for a number of more
  specialized tables including hgLoadOut for RepeatMasker data.  There
  is also a .doc file describing in detail how we created each database
  in the files named things like makeHg13.doc and makeMm2.doc.

B) Loading a bed file

  Loading a bed file is the most straightforward.  First decide on
  the name you want to call the table.  Then do
      hgLoadBed database tableName file.bed
  type hgLoadBed with no arguments for further information. 
  Database will be something like hg13 or mm2.

C) Loading a psl file

  Loading a psl file is also easy.  Make sure that the psl file
  is sorted by chromosome (tName) and start position (tStart).  Use
  kent/src/hg/pslSort or just plain Unix sort for this if necessary.
  If the number of alignments is somewhat modest (say less than 
  500,000) then do
  	hgLoadPsl database -table=tableName file.psl -tNameIx
  This will load everything into one big table.  For huge numbers of
  alignments the browser will be faster if you first split up the
  data into one file for each chromosome.  Name these files 
  chr1_tableName.psl  chr2_tableName.psl and so forth.  Then do
       hgLoadPsl database chr*_tableName.psl
  This will end up making a separate table for each chromosome.
  Unfortunately it is still a bit complicated to make the details
  pages for a psl format track to include the alignments themselves.
  Please contact us at UCSC if this is a priority for you and we
  will try to make it easier.

D) Loading a GTF (or GFF) file

  Generally GTF is a much more tightly defined standard than GFF, so
  GTF files are more likely to work without tweaking.  However most
  reasonable GFFs will work as well.  To load do
       ldHgGene database tableName file(s).gtf
  This will make a gene-prediction type table. You often will want
  to create an associated predicted peptide table as well. To do this
  do
       hgPepPred database generic tableNamePep file(s).fa
  The first word after the '>' in the fasta files should use the same
  symbol as the 'group' in GFF files or 'transcript_id' in GTF files.

IV. Updating trackDb

Your data will not display in the browser until you load it into trackDb. To do this first cd ~/kent/src/hg/makeDb/trackDb and look at the file trackDb.ra, and read the README file. Then decide whether your new track should be global, organism specific, or assembly specific, and edit the corresponding trackDb.ra file. Generally it's good to find an existing track as similar as possible to the track you want to add, copy and paste it, and modify the copy. Then put any explanitory text you want on the track in trackName.html in the appropriate directory. After this do a make alpha to update the trackDb table, or a

       make

to update trackDb_user (where user is you Unix username). If multiple engineers are working on the project you can set up cgi-bin-user directories with hg.conf files that will tell the browser to use trackDb_user instead of trackDb to avoid conflicts with other engineers code.

After the 'make alpha' the browser should show your track. Congratulations if you've made it this far. See also http://www.soe.ucsc.edu/~sugnet/doc/trackHowto/browserTalk.pdf for Charles Sugnet's description of how to add a track including some code customization.

-Jim Kent Feb 14, 2003.

A BED track description for a custom SNP track

The .bed file has lines like this:

chr22   49512530        49512531        rs8137951       0       +       49512530        49512531        255,0,0
chr22   49518559        49518560        rs756638        0       +       49518559        49518560        255,0,0
chr22   49522492        49522493        rs3810648       0       +       49522492        49522493        255,0,0
chr22   49524956        49524957        rs2285395       0       +       49524956        49524957        255,0,0
chrX    100017093       100017094       rs5921682       0       +       100017093       100017094       255,0,0
chrX    100019802       100019803       rs5967204       0       -       100019802       100019803       255,0,0

Load the .bed file into a custom track called Ill550v3 Note the .bed file must have NO browser or track lines - otherwise hgLoadBed will barf

/var/www/cgi-bin/loader/hgLoadBed -noBin  -tab hg18 Ill550v3 /home/rerla/public_html/illHH550v3.BED

Adjust your trackDb.ra file

I'm working (eg) on meme in

/home/rerla/hg18/kent/src/hg/makeDb/trackDb/human/hg18

I add these settings - they seem to work the way I want, allowing the name field from the .bed file to appear in full view:


track Ill550v3
priority 20.1   
shortLabel Illum550kv3           
longLabel Illumina 550k v3 snps
visibility pack
type bed 9 .          
color 255,0,0   
group varRep                 
thickDrawItem on

Recompile the trackDb

make alpha DBS=hg18