trackDbIndexBb - A utility to support hideEmptySubtracks on Composite Tracks

When there are many subtracks in a composite view, it may be useful to limit the display to only those with data in the current viewing window. The trackDb setting, hideEmptySubtracks, enables this behavior. This track setting produces a checkbox on the track configuration page allowing the user to enable or disable this feature. If it is configured to 'on', then the feature will be on by default (the checkbox is checked). To take full advantage of this setting it is helpful, though not always required, to index the underlying bigBed files, using the trackDbIndexBb utility. This utility creates multibed/index files containing the coordinates where the tracks intersect, expediting data lookup. There are two instances in which these files are helpful or required:

To build the index files, first download the trackDbIndexBb utility. For more information on downloading our command line utilities, see these instructions.

There are three other programs needed to run trackDbIndexBb. Two of them, bedToBigBed and bigBedToBed, can be found in the same downloads directory. The final dependency, bedtools, can be found on the bedtools site.

Parameters

Kent utilities can be run with no parameters to display a usage message. Additionally, trackDbIndexBb can be passed the - h flag to display a more verbose help message.

./trackDbIndexBb
./trackDbIndexBb -h

Below is a short description of the parameters:

Example 1 - Building Index Files

In this first example, we will build index files for a composite track containing 12 bigBed files. Index files hardly improve performance on a track with so few files, however, the steps would be the same on a larger track.

First, we can take a look at the header stanza for the composite. The complete trackDb.ra file is available here.

track problematic
shortLabel Problematic Regions
longLabel Problematic Regions for NGS or Sanger sequencing or very variable regions
compositeTrack on
hideEmptySubtracks off
group map
visibility hide
type bigBed 3 +

We can see that the hideEmptySubtracks setting is already enabled, set off by default. The index files we are building are not required, but instead, improve the performance of the feature. This also gives us the composite track name, problematic. We will want to pass this as our trackName variable.

The other two required parameters are the path to the trackDb.ra file, and the chrom.sizes file. If we assume both of those are in the current directory, and that the required dependencies are present in the path, we can run trackDbIndexBb as such:

./trackDbIndexBb problematic smallExampleTrackDb.ra hg19.chrom.sizes

This will generate two files in the current directory:

problematic.multiBed.bb
problematic.multiBedSources.tab

We can then enable the use of these index files for hideEmptySubtracks by adding the following two lines to our trackDb.ra file, adjusting the path if needed:

hideEmptySubtracks off
hideEmptySubtracksMultiBedUrl problematic.multiBed.bb
hideEmptySubtracksSourcesUrl problematic.multiBedSources.tab

Example 2 - Creating Track Associations

In this longer example, we are looking to build index files with track associations between DNase-seq peak and signal tracks. There are 2 bigBed peak tracks and 4 bigWig signal tracks. The complete trackDb for the example can be found here.

Looking at the top level stanza, we see that this composite track has two views, one for peaks and one for signals. The data are associated with a few different subGroups:

track uniformDnase
subGroup4 lab Lab Duke=Duke UW=UW UWDuke=UW-Duke
subGroup3 view View Peaks=Peaks Signal=Signal
subGroup2 cellType Cell_Line GM12878=GM12878 H1-hESC=H1-hESC

To help us decide how to best make these associations, let us see what parts of the peak and signal stanzas we would like to associate are relevant:

                track wgEncodeUWDukeDnaseGM12878FdrPeaks
                type bigBed 6 +
                parent uniformDnasePeaks on
                bigDataUrl wgEncodeUWDukeDnaseGM12878.fdr01peaks.hg19.bb
                subGroups view=Peaks tier=t1 cellType=GM12878 lab=UWDuke
                metadata cell=GM12878

                track wgEncodeDukeDnaseGM12878FdrSignal
                type bigWig
                parent uniformDnaseSignal on
                bigDataUrl wgEncodeOpenChromDnaseGm12878Aln_5Reps.norm5.rawsignal.bw
                subGroups view=Signal tier=t1 cellType=GM12878 lab=Duke
                metadata cell=GM12878 lab=Duke

                track wgEncodeUWDnaseGM12878FdrSignal
                type bigWig
                parent uniformDnaseSignal on
                bigDataUrl wgEncodeUwDnaseGm12878Aln_2Reps.norm5.rawsignal.bw
                subGroups view=Signal tier=t1 cellType=GM12878 lab=UW
                metadata cell=GM12878 lab=UW

The first track is the bigBed peaks track (peaks view) and the second and third are bigWig signal tracks (signal view). hideEmptySubtracks allows for two optional variables to build track associations. The first, -m --metaDataVar, designates which trackDb variable will be used to build the association. In this example, the peaks are determined using a combination of the signal tracks, therefore, we would like to display both of the signal tracks whenever the peak track has data.

At this point, it is important to explain how trackDbIndexBb makes track associations. It will look at the stanza variable line designated by -m --metaDataVar, then look for identical matching lines in other stanzas. Since at least one parameter within will usually differ, such as the designation between peak and signal, -s --subGroupRemove can be used to strip out one of the parameters in the line.

The subGroups parameter could be used. However, we see that the two variables that differ between the peak and signal stanzas are view and lab. We would have to strip both of those to have matching parameter variables and build an association. Alternatively, we could use the metaData parameter. This parameter associates the tracks by the cell, with only the lab variable differing. This would be the best choice as only a single parameter would have to be stripped, lab, as opposed to two, lab and view, to have matching peak and signal parameters for related tracks.

Now that we know which parameter we would like to use to build associations, we need to use the second optional parameter, -s --subGroupRemove, to tell hideEmptySubtracks which variables to strip out in making the association. In this case, we would like to keep the cell variable, but strip the lab. This means that lab will be the parameter passed. In this way, associations will be made between any tracks that match the contents of their metaData parameter once the lab variable has been stripped out.

Now that we have chosen our parameters, we will run the utility -- assuming our chrom.sizes file, our trackDb.ra file, and all the supporting programs (bedToBigBed, bigBedToBed, bedtools) are present in the current directory. We will also choose the output to be the current directory:

./trackDbIndexBb uniformDnase exampleTrackDb.ra chrom.sizes -o . -p . -m metadata -s lab

Note that in this case, we could have omitted the -o and -p values as the current directory is the default for both.

In this small example, the utility would run in a few seconds. But larger inputs containing hundreds of tracks can take hours. Upon completion, two files will be generated:

uniformDnase.multiBed.bb
uniformDnase.multiBedSources.tab

The .bb file will be a big multibed containing the coordinates where the tracks intersect, expediting data lookup, and the .tab file will serve as an index for the multibed while also containing the track associations. The .tab file can be quickly examined to ensure proper generation as it should contain a numerical first column, followed by the bigBed track, then any number of desired track associations, e.g.

1	wgEncodeUWDukeDnaseGM12878FdrPeaks	wgEncodeDukeDnaseGM12878FdrSignal	wgEncodeUWDnaseGM12878FdrSignal
2	wgEncodeUWDukeDnaseH1hESCFdrPeaks	wgEncodeDukeDnaseH1hESCFdrSignal	wgEncodeUWDnaseH1hESCFdrSignal

Finally, hideEmptySubtracks can be enabled and pointed to the newly generated files on the top composite stanza:

hideEmptySubtracks on
hideEmptySubtracksMultiBedUrl uniformDnase.multiBed.bb 
hideEmptySubtracksSourcesUrl uniformDnase.multiBedSources.tab

More information on how to use track hubs can be found in the Track Hub help page as well as the Track Database Definition Document.