bigMaf Track Format
 

The bigMaf format stores multiple alignments in a format compatible with MAF files that are compressed and indexed as bigBeds are.

bigMaf files are created using the program bedToBigBed with a special AutoSQL file that defines the fields of the bigMaf. The resulting bigBed files are in an indexed binary format. The main advantage of the bigBed files is that only portions of the files needed to display a particular region are transferred to UCSC. So for large data sets, bigBed is considerably faster than regular BED files. The bigBed file remains on your web accessible server (http, https, or ftp), not on the UCSC server. Only the portion that is needed for the chromosomal position you are currently viewing is locally cached as a "sparse file".

Big MAF

The following AutoSQL definition is used for bigMaf gene prediction files. This is the bigMaf.as file defined by the -as= option when using bedToBigBed. Click this bed3+1 file for an example of bigMaf input.
table bedMaf
"Bed3 with MAF block"
    (
    string chrom;      "Reference sequence chromosome or scaffold"
    uint   chromStart; "Start position in chromosome"
    uint   chromEnd;   "End position in chromosome"
    lstring mafBlock;   "MAF block"
    )

Note that the bedToBigBed utility uses a substantial amount of memory; somewhere on the order of 1.25 times more RAM than the uncompressed BED input file.

To create a bigMaf track, follow these steps:

This needs to be incorporated below:
awk -f mafToBigMaf.awk chr22_KI270731v1_random.maf | sed '/^$/d' | sed 's/hg38.//'  > bigMaf.txt
hgLoadMafSummary -test hg38 bigMafSummary stdin
hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf
cut -f 2- bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed
bedToBigBed -type=bed3+4 -as=mafSummary.as  -tab bigMafSummary.bed chrom.sizes bigMafSummary.bb
chr22_KI270731v1_random:11,822-12,023
genePredSingleCover chr22_KI270731v1_random.gp single.gp
genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp
  1. Create a bed3+1 bigMaf format file that has the first three fields described by a normal BED file as described here. (You can also read about MAF here.)
    • Your bigMaf file must have the extra field described in the AutoSQL file above: mafBlock
    • Your bigMaf file must be sorted by chrom then chromStart. You can use the UNIX sort command to do this: sort -k1,1 -k2,2n unsorted.bed > input.bed
  2. Download the bedToBigBed program from the directory of binary utilities.
  3. Use the fetchChromSizes script from the same directory to create a chrom.sizes file for the UCSC database you are working with (e.g. hg38). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, for the hg38 database, the hg38.chrom.sizes are located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
  4. Create the bigBed file from your sorted bigMaf input file using the bedToBigBed utility like so: bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt chrom.sizes bigMaf.bb
  5. Move the newly created bigBed file (bigMaf.bb) to an http, https, or ftp location.
  6. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the "track" line will look something like this:
    track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/myBigGenePred.bb
  7. Paste this custom track line into the text box in the custom track management page.
The bedToBigBed program can also be run with several additional options. A full list of the available options can be seen by running bedToBigBed with no arguments to display the usage message.

Example One

awk -f mafToBigMaf.awk chr22_KI270731v1_random.maf | sed '/^$/d' | sed 's/hg38.//'  > bigMaf.txt
bedToBigBed -type=bed3+1 -as=bigMaf.as  -tab bigMaf.txt chrom.sizes  bigMaf.bb 
bedToBigBed -type=bed4+7 -as=mafFrames.as -tab bigMafFrames.txt chrom.sizes bigMafFrames.bb
track type=bigMaf name="bigMaf Example One" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb summary=http://genome.ucsc.edu/goldenPath/help/examples/bigMafSummary.bb

In this example, you will use an existing bigMaf file to create a bigMaf custom track. A bigMaf file that contains data on the hg38 assembly has been placed on our http server. You can create a custom track using this bigMaf file by constructing a "track" line that references this file like so:

track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb

Paste the above "track" line into the custom track management page for the human assembly hg38 (Dec. 2013), then press the submit button.

Custom tracks can also be loaded via one URL line. The below link loads the same bigMaf track, but includes parameters on the URL line:

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb

With this example bigMaf loaded, click into a gene from the track. Note that the details page has a "Links to sequence:" section that includes "Translated Protein", "Predicted mRNA", and "Genomic Sequence" links. Click the "Go to ... track controls" link. There change the "Color track by codons:" option from "OFF" too "genomic codons" and be sure "Display mode:" is "full" then click "Submit". Then zoom to a region where amino acids display, such as chr9:133,255,650-133,255,700 and see how bigMaf allows the display of codons. Click back into the track controls page and click the box next to "Show codon numbering". Return to the browser to see amino acid numbering.

You can also add a parameter in the custom track line, baseColorDefault=genomicCodons, to set the display of codons:

browser position chr10:67,884,600-67,884,900
track type=bigMaf baseColorDefault=genomicCodons name="bigMaf Example Two" description="A bigMaf file" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb

Paste the above into the hg38 custom track page to see an example of bigMaf amino acid display around the beginning of the gene SIRT1 on chromosome 10.

Example Two

In this example, you will create your own bigMaf file from an existing bigMaf input file.

  • Save this bed12+8 bigMaf.txt example input file to your machine (satisfies above step 1).
  • Download the bedToBigBed utility (step 2).
  • Save this hg38.chrom.sizes text file to your machine. It contains the chrom.sizes for the human (hg38) assembly (step 3).
  • Save this bigMaf.as text file to your machine.
  • Run the utility to create the bigBed output file (step 4):
    bedToBigBed -type=bed12+8 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb
  • Place the bigBed file you just created (bigMaf.bb) on a web-accessible server (step 5).
  • Construct a "track" line that points to your bigMaf file (see step 6).
  • Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the genome browser (see step 7).
Note the above description in Example One on how to view genomic codons, including numbering.

Sharing Your Data with Others

If you would like to share your bigMaf data track with a colleague, learn how to create a URL by looking at Example 11 on this page.

Extracting Data from bigBed Format

Since the bigMaf files are an extension of bigBed files, which are indexed binary files, they can be difficult to extract data from. We have developed the following programs, all of which are available from the directory of binary utilities.

  • bigBedToBed — this program converts a bigBed file to ASCII BED format.
  • bigBedSummary — this program extracts summary information from a bigBed file.
  • bigBedInfo — this program prints out information about a bigBed file.
As with all UCSC Genome Browser programs, simply type the program name at the command line with no parameters to see the usage statement.

Troubleshooting

If you encounter an error when you run the bedToBigBed program, it may be because your input bigMaf file has data off the end of a chromosome. In this case, use the bedClip program here before the bedToBigBed program. It will remove the row(s) in your input BED file that are off the end of a chromosome.