HgTracks multi-region changes
This page provides an overview of the changes made to hgTracks and the position cart variable by Galt when implementing multi-region (a.k.a. "exon-mostly") display mode. It began as part of code review #16514 by Angie. Instead of adding these notes to the Redmine ticket, I am putting them in the wiki so that Galt can correct any of my misunderstandings, and hopefully this will serve as a rough guide for any developers who will need to interact with this code in the future.
The big merge of code changes for the initial implementation of multi-region display is 8c908f948b; see also http://genecats.cse.ucsc.edu/git-reports-history/v327/review/user/galt/index.html
hgTracks has always operated on the assumption that the user is viewing a single region of some reference assembly sequence. There are global variables for that region and global variables for layout parameters of a single region image. These global variables are referenced throughout all tracks' drawing code. Track loaders fetch data from a single region. Galt implemented multi-region display in a way that barely touched the bulk of hgTracks code -- quite an accomplishment! -- partly by introducing new structures and loops, and partly by carefully overwriting the pre-existing global variables. Developers working on loading and drawing code in the future can mostly ignore those changes and continue using the old globals in the same way as before. However, there may be cases in which track handlers will have to make use of data from all regions (for example, wiggle auto-scale). Developers who add a new global variable will have to decide whether it needs to be updated every time the current region changes. But hopefully they'll not add any more globals. :)
Multi-region display UI
There is a new button underneath the browser image, labeled 'multi-region'. Clicking it produces a pop-up with options for multi-region display. In the initial version there is a radio button set as shown below, followed by a checkbox that controls whether the regions are visually separated by thin red lines (the default) or highlighting alternating regions.
(*) Show single-chromosome view (default)
( ) Show exons or ( ) genes using GENCODE v22. Use padding of: [6] bases.
( ) Custom regions from BED URL: [ ]
( ) Show one alternate haplotype, placed on its chromosome, using ID: [ chr1_KI270762v1_alt ]
[ ] Highlight alternating regions in multi-region view
The 'Show exons or genes' line appears only when a suitable gene table can be identified. The 'Show one alternate haplotype' option appears only when there is an altLocations sql table.
Within the four main options, there are additional parameters:
- a fifth mode option to omit regions between genes instead of between exons; default unselected
- the padding option for exon or gene mode; default 6
- the URL for a BED file (with optional multi-region-specific settings) provided by the user; default empty
- an alternate haplotype ID; default is "chr6_cox_hap2" or if not applicable, the first item in the altLocations table
There are also two new keyboard bindings: 'e v' changes the mode to 'Show exons' and 'd v' changes the mode to single-region.
The code includes several additional modes not shown above, mostly "demo" modes for testing during development. Since those are not reachable by users, I won't describe those.
Changes to the position cart variable (and hence other CGIs)
Before multi-region, the position cart variable contained either seq:start-end or a search term that must be resolved into seq:start-end. seq was always a chromosome/scaffold name that could be found in the current db's chromInfo table, and had start and end coordinates relative to that chromosome/scaffold. Several other CGIs (such as the Table Browser, Gene Sorter, DI, VAI) assumed this formatting of position. Now, when viewing custom regions from the user or an alt haplotype, position's value begins with virt and its start and end coordinates are for a "virtual chromosome" constructed by joining all regions.
The virt:start-end form does not (yet) make sense to any other CGI besides hgTracks, so some code has been added to the beginning of each CGI that uses the position variable to detect virt: at the beginning; if found, it swaps in the value of a new cart variable, nonVirtPosition.
Code changes
Needless to say, changing a paradigm in hgTracks involves adding and changing a lot of code. This is a high-level overview, not a comprehensive list.
New cart variables
virtMode
Boolean: true if we are not in single-region mode.
virtModeType
One of {"default", "exonMostly", "geneMostly", "customUrl", "singleAltHaplo" and some others not accessible by UI}. This stores the user's choice of mode in the multi-region pop-up.
multiRegionsBedUrl
The user's URL for custom regions (user sets in pop-up)
singleAltHaploId
The name of the alternate haplotype sequence used in alt-haplo mode (user sets in pop-up)
emAltHighlight
If true, alternating regions are differentiated by highlighting instead of red lines. (checkbox in pop-up)
emPadding
The number of bases to add on each side of each exon/gene when making the region list in exonMostly/geneMostly mode (user sets in pop-up)
emGeneTable
The table used to make regions in exonMostly/geneMostly mode. Currently this is set by the code, not the user.
lastVirtModeType
Stored in cart with the same value as virtModeType, for detecting changes to virtModeType.
lastVirtModeExtraState
CGI-encoded sequence of several mode-specific parameters for detecting changes:
- singleAltHaploId (see above)
- multiRegionsBedUrl: the name and file modification date of the user's URL for custom regions
- singleTransId: ignoring here because the singleTrans mode is not UI-accessible
virtModeShortDescr
Typically one word like "exons" or "genes"; this appears in the ideogram. This is inferred from the mode except when in customUrl mode if the user specifies one.
nonVirtPosition
When in multi-region mode, the start and end of all regions on the first region's chromosome. For use by other CGIs when position begins with "virt".
oldPosition
For detecting changes in position that somehow are not caught by pre-existing cart var lastPosition.
position.db
Not just a position -- this stores pretty much all of the above, CGI-encoded, in case we change db and then return to this db.
Additions to struct track
struct track is fundamental to hgTracks: it encompasses track data, track metadata and track methods for loading, drawing, labeling, mapping etc. hgTracks has always had a single global trackList, iterating over it at loading time and at every stage of building up the main image and mapbox (actually makeActiveImage builds flatTracks from trackList and uses that instead of trackList, but still, it has always used a one-dimensional list).
Since each struct track in the list includes loading and drawing functions that rely on global variables for genomic coordinates and pixel coordinates, the least invasive change to the code was to make a separate struct track per region for each track -- so in effect we have a two-dimensional array of struct track, implemented as lists using separate pointers. track->next still points to the next track in trackList -- but trackList is different for every region. struct track now has two new members, nextWindow and prevWindow, which connect the per-region instances of a track.
In addition, Kate's GTEx track draws a fixed-pixel-width bar graph above the scaled-width transcript, so track code needs a way to draw on the image in a way that may extend into other regions' windows. The packing code also needs to be aware of the possibly larger pixel width taken up. So two new methods were added, nonPropPixelWidth for use by the packing code and nonPropDrawItemAt for the drawing code.
struct track *nextWindow, *prevWindow
These point to the same track's instance of struct track for the region to the right (if any) and the region to the left (if any). They are populated in doTrackForm. nextWindow is used in simpleTracks.c, hgTracks.c and gtexTracks.c (for variable height calculation) to iterate over regions for a single track. As of 1/8/16, prevWindow seems to be used only in a disabled section of doTrackForm marked "// TEMP HACK GALT REMOVE" (disabled by loadHack = FALSE).
int (*nonPropPixelWidth)(struct track *tg, void *item)
Currently populated only by the GTEx track, this returns the width in pixels of the non-scaled part (e.g. GTEx bar graph) that is drawn for each item. If non-NULL, packCountRowsOverflow calls it when computing ranges to pass into the spaceSaver.
void (*nonPropDrawItemAt)(struct track *tg, void *item, struct hvGfx *hvg, ...)
Currently populated only by the GTEx track, this draws the non-scaled part of an item. If non-NULL, it is called with a clipping rectangle set to the whole image instead of to the current region/window by genericDrawOverflowItem and genericDrawItem.
New data structures in hgTracks.h
struct virtRegion
This maps a particular region to its place in the reference assembly. Used only in hgTracks.c. See #Initialization_of_regions_in_tracksDisplay.
struct virtRegion
/* virtual chromosome structure */
{
struct virtRegion *next;
char *chrom;
int start;
int end;
char strand[2]; /* + or - for strand */
};
struct virtChromRegionPos
In practice, this is one member of an array; the members of the array contain successive offsets within the virtual chromosome for each region. Using an array instead of a list enables binary search for regions within the current (possibly zoomed-in) position on the virtual chromosome. Used only in hgTracks.c; see functions makeVirtChrom, virtChromBinarySearch, makeWindowListFromVirtChrom, virtChromSearchForPosition.
struct virtChromRegionPos
/* virtual chromosome region position*/
{
long virtPos;
struct virtRegion *virtRegion;
};
struct positionMatch
This is a (listy) range on the virtual chrom. Lists of these are used to translate genomic coords (chrom, start, end) into ranges on the virtual chromosome. Used only in hgTracks.c; see functions virtChromSearchForPosition and findNearestVirtMatch.
struct positionMatch
/* virtual chroom position that matches or overlaps search query chrom,start,end */
{
struct positionMatch *next;
long virtStart;
long virtEnd;
};
struct window
A window is a (part of a) region that the user is currently viewing. For example, if the virtual region list encompasses all exons of the currently viewed gene, but then we zoom in to view just a couple exons, then windows are instantiated for those two exon regions. struct window contains the (viewed part of the) region's genomic coords, virtual chromosome coords, and pixel x coords (left offset and width). It has a flag used when highlighting alternate regions instead of drawing red lines between them. Its trackList contains the struct track instances that were created to load and draw this region's data (see also #Additions_to_struct_track). It is used by multiple track drawing routines (e.g. cdc.c, cytoBandTrack.c, rmskJoined.c, simpleTracks.c). See hgTracks.c functions makeWindowListFromVirtChrom, setGlobalsFromWindow as well as various loops on windows in many drawing and position-calculating functions.
struct window // window in multiwindow image
{
struct window *next; // Next on list.
// These two were experimental and will be removed soon:
char *organism; /* Name of organism */
char *database; /* Name of database */
char *chromName;
int winStart; // in bases
int winEnd;
int insideX; // in pixels
int insideWidth;
long virtStart; // in bases on virt chrom
long virtEnd;
boolean regionOdd; // window comes from odd region? or even? for window separator coloring
struct track *trackList; // track list for window
};
struct convertRange
This is used by some extremely complicated code in simpleTracks.c's linkedFeaturesNextPrevExonFind and linkedFeaturesNextPrevItem. The gist of it is to find regular genomic coordinates for some exon(s) on the virtual chrom if possible. If you've never before encountered a for loop that contains a goto jumping backwards into the middle of a previous 60+ line while loop inside an if clause, which may then iterate some more... well, here's your chance. Also, bedTrack.c's simpleBedNextPrevEdge constructs one of these and calls linkedFeaturesNextPrevExonFind on it to get virt coords.
struct convertRange
{
struct convertRange *next;
char *chrom;
int start;
int end;
long vStart;
long vEnd;
boolean found;
boolean skipIt;
};
New global variables
Many new global variables were added, following the convention of declaring in hgTracks.h and initializing in simpleTracks.c. However, several of these are used only within hgTracks.c.
boolean virtMode
Cart variable, see above.
char *virtModeType
Cart variable, see above. Default: "default" (i.e. not multi-region)
char *lastVirtModeType
Cart variable, see above.
char *virtModeShortDescr
Cart variable, see above.
char *lastVirtModeExtraState
Cart variable, see above.
char *multiRegionsBedUrl
Cart variable, see above. Default: ""
boolean emAltHighlight
Cart variable, see above.
int emPadding
Cart variable, see above. Default: 6
char *emGeneTable
Cart variable, see above.
struct cart *lastDbPosCart
Not a cart! struct cart is used to CGI-decode the values that were encoded in lastVirtModeExtraState into a hash. Only the hash is used.
char *virtModeExtraState
Not a cart variable, but it becomes the next value of lastVirtModeExtraState, i.e. it encodes mode-specific parameters such as singleAltHaploId and multiRegionsBedUrl (plus mod date)
struct virtRegion *virtRegionList
List of regions that are joined to make the virtual chromosome
struct virtChromRegionPos *virtChrom
Array of successive regions and offsets into the virtual chromosome
int virtRegionCount
Number of regions in virtual chromosome and size of the array virtChrom
long virtSeqBaseCount
Number of bases in virtual chromosome
long virtWinStart, virtWinEnd
Start and end of the portion of the virtual chromosome that is currently displayed.
long virtWinBaseCount
The length of the currently display portion of the virtual chromosome. Set to virtWinEnd - virtWinStart every time virtWinEnd and virtWinStart are changed.
long defaultVirtWinStart, defaultVirtWinEnd
Used only when transitioning from default mode into singleAltHaplo mode: the start/end within the virtual chromosome of the alt haplo plus flanks of equal length on the main chromosome (trimmed to start/end of main chromosome). In singleAltHaplo mode, the virtual chromosome is constructed by concatenating all assembly sequences (!) and then replacing the main chromosome sequence with the part of the main chromosome preceding the alt haplo, then the alt haplo, and then the rest of the chromosome after the alt haplo.
char *virtChromName
When in multi-region mode, virt; otherwise just the good old chromName.
boolean virtChromChanged
True only when changing from multi-region mode into another, or when a parameter change is detected using lastVirtModeExtraState; not used by C code, but is passed forward to JS code.
struct track *emGeneTrack
The struct track for gene table for exonMostly/geneMostly
struct rgbColor vertWindowSeparatorColor
Constant: { 255, 220, 220} (light red for vertical lines between regions)
char *singleAltHaploId
Default: "chr6_cox_hap2"
struct window *windows
A list of the currently viewed (i.e. within the position range) portions of regions. The virtual chromosome contains all regions applicable to the current mode, but if the user zooms in, we display only the parts of regions that they are viewing. See also #struct_window. This list is created when tracksDisplay calls makeWindowListFromVirtChrom. Note: a couple functions have local variables with the same name (disguisePositionVirtSingleChrom, nonVirtPositionFromHighlightPos), not to be confused with this global list. (There are also several uses of makeWindowListFromVirtChrom for translating ranges other than the current position.)
struct window *currentWindow
The element of the list windows that we're working on right now. This is frequently used to test if we're working on the first window (if (currentWindow == windows)), for things that should be done only once, e.g. track labels, or by code that expects to be starting at the first window. rmskJoinedTrack.c and cds.c use it to keep their own global variables in sync with the hgTracks control code.
bool trackLoadingInProgress
// flag to delay ss layout until all windows are ready.
int fullInsideX, fullInsideWidth
Full-image insideX and insideWidth, for the few tracks that need to know the offset/width of the whole image, not just the offset/width of the current region's slice.
char *singleTransId
Not described here because singleTrans mode is not UI-accessible
int demo2NumWindows, demo2WindowSize, demo2StepSize
Not described here because demo2 mode is not UI-accesible