The bigMaf format stores multiple alignments in a format compatible with
MAF files, which are then compressed and indexed as
bigBeds. bigMaf files are created using the program bedToBigBed
with a special AutoSQL
file that defines the fields of the bigMaf. The resulting bigMaf files are in an indexed binary
format. The main advantage of the bigMaf files is that only portions of the files needed to display
a particular region are transferred to UCSC. So for large data sets, bigMaf is considerably faster
than regular MAF files. The bigMaf file remains on your web accessible server (http, https, or ftp),
not on the UCSC server. Only the portion that is needed for the chromosomal position you are
currently viewing is locally cached as a "sparse file".
The following AutoSQL definition is used for bigMaf multiple alignment files. This is the
bigMaf.as
file defined by the -as
option
when using bedToBigBed
.
table bedMaf
"Bed3 with MAF block"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
lstring mafBlock; "MAF block"
)
Note that the bedToBigBed
utility uses a substantial amount of memory; somewhere on the
order of 1.25 times more RAM than the uncompressed BED input file.
To create a bigMaf track, follow these steps:
bedToBigBed
:
bedToBigBed
and mafToBigBed
programs from the
directory of binary utilities.hgLoadMafSummary
, genePredSingleCover
, and
genePredToMafFrames
programs from the same
directory.fetchChromSizes
script from the same
directory to create a chrom.sizes file for
the UCSC database you are working with (e.g., hg38). Alternatively, you can download the
chrom.sizes file for any assembly hosted at UCSC from our
downloads page (click on "Full
data set" for any assembly). For example, for the hg38 database, the hg38.chrom.sizes are
located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
mafToBigMaf hg38 chr22_KI270731v1_random.maf stdout | sort -k1,1 -k2,2n > bigMaf.txt
bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb
genePredSingleCover chr22_KI270731v1_random.gp single.gp
genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp
bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb
hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf
cut -f 2 bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed
bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb
bigMaf.bb
) to an http, https, or ftp
location.bigMafSummary.bb
and/or bigMafFrames.bb
files, they will also need to be in a web accessible location, likely in the same location as
bigMaf.bb
.track type=bigMaf name="My Big MAF" description="A Multiple Alignment"
bigDataUrl=http://myorg.edu/mylab/bigMaf.bb
The bedToBigBed
program can also be run with several additional options. Run
bedToBigBed
with no arguments to view a full list of available options.
In this example, you will use an existing bigMaf file to create a bigMaf custom track. A bigMaf file that contains data on the hg38 assembly has been placed on our http server. You can create a custom track using this bigMaf file by constructing a "track" line that references this file like so:
track type=bigMaf name="bigMaf Example One"
description="A bigMaf file"
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb
frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb
Paste the above "track" line into the custom
track management page for the human assembly hg38 (Dec. 2013), then press the submit
button. Please note that additional track line options exist that are specific for the
MAF format. For instance, adding
speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5"
to the above example
will allow specifying the order of sequences.
Custom tracks can also be loaded via one URL line. The below link loads the same bigMaf track, but includes parameters on the URL line:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack
With this example bigMaf loaded, click into an alignment from the track. Note that the details page has information about the individual alignments, similar to the details page of a standard MAF track.
In this example, you will create your own bigMaf file from an existing bigMaf input file.
bed3+1
bigMaf.txt example input file to
your machine (satisfies the first part of the above step 6).bigMaf.as
text file to your machine
(Step 2).bedToBigBed
utility (step 3).hg38.chrom.sizes
text file to your machine.
It contains the chrom.sizes for the human (hg38) assembly (step 4).bedToBigBed
utility to create the binary indexed MAF file (completes
step 6):
bedToBigBed -type=bed3+1 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb
bigMaf.bb
) on a web-accessible server
(step 8).If you would like to share your bigMaf data track with a colleague, learn how to create a URL by looking at Example 11 on this page.
Since the bigMaf files are an extension of bigBed files, which are indexed binary files, they can be difficult to extract data from. We have developed the following programs, all of which are available from the directory of binary utilities.
bigBedToBed
— this program converts a bigBed file to ASCII BED format.bigBedSummary
— this program extracts summary information from a bigBed
file.bigBedInfo
— this program prints out information about a bigBed file.As with all UCSC Genome Browser programs, simply type the program name at the command line with no parameters to see the usage statement.
If you encounter an error when you run the bedToBigBed
program, it may be because your
input bigMaf file has data off the end of a chromosome. In this case, use the bedClip
program here before the
bedToBigBed
program. It will remove the row(s) in your input BED file that are off the
end of a chromosome.