The bigChain format describes a pairwise alignment that allow gaps in both sequences simultaneously,
just as Chain files do, but bigChain files are compressed and indexed as
bigBeds. bigChain files are created using the program bedToBigBed
with a special
AutoSQL file that defines the fields of the bigChain. The resulting bigChain files are in an indexed
binary format. The main advantage of the bigChain files is that only portions of the files needed to
display a particular region are transferred to UCSC. So for large data sets, bigChain is
considerably faster than regular Chain files. The bigChain file remains on your web accessible
server (http, https, or ftp), not on the UCSC server. Only the portion that is needed for the
chromosomal position you are currently viewing is locally cached as a "sparse file".
The following AutoSQL definition is used for bigChain pairwise alignment files. This is the
bigChain.as
file defined by the -as
option when using bedToBigBed
.
table bigChain
"bigChain pairwise alignment"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name or ID of item, ideally both human readable and unique"
uint score; "Score (0-1000)"
char[1] strand; "+ or - for strand"
uint tSize; "size of target sequence"
string qName; "name of query sequence"
uint qSize; "size of query sequence"
uint qStart; "start of alignment on query sequence"
uint qEnd; "end of alignment on query sequence"
uint chainScore; "score from chain"
)
Note that the bedToBigBed
utility uses a substantial amount of memory;
approximately 25% more RAM than the uncompressed BED input file.
To create a bigChain track, follow these steps:
bedToBigBed
:
bigChain.as and
bigLink.asbedToBigBed
and hgLoadChain
programs from the
directory of binary utilities.fetchChromSizes
script from the same
directory to create a chrom.sizes file for
the UCSC database you are working with (e.g., hg38). Alternatively, you can download the
chrom.sizes file for any assembly hosted at UCSC from our
downloads page (click on "Full
data set" for any assembly). For example, for the hg38 database the hg38.chrom.sizes are
located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.chain.tab
and link.tab
files needed to create the bigChain
file with the hgLoadChain
utility:
hgLoadChain -noBin -test hg38 bigChain chr22_KI2707731v1_random.hg38.mm10.rbest.chain
sed
,
awk
and the bedToBigBed
utility:
sed 's/.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > chr22_KI270731v1_random.hg38.mm10.rbest.bigChain
bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb
awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n > bigChain.bigLink bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb
bigChain.bb
and
bigChain.link.bb
) to an http, https, or ftp location. track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link.bb
The bedToBigBed
program can also be run with several additional options. Run
bedToBigBed
with no arguments to view a ful list of available options.
In this example, you will use an existing bigChain file to create a bigChain custom track. A bigChain file that contains data on the hg38 assembly has been placed on our http server. You can create a custom track using this bigChain file by constructing a "track" line that references this file:
track type=bigChain name="bigChain Example One" description="A bigChain file"
bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb
linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb
Paste the above "track" line (removing the line breaks)into the custom track management page for the human assembly hg38 (Dec. 2013), then press the "submit" button.
Custom tracks can also be loaded via one URL line. This link loads the same bigChain track, but includes parameters on the URL line (line breaks have been inserted for readability):
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random
&hgct_customText=track%20type=bigChain%20name=Example
%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb
%20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack
With this example bigChain loaded, click into a chain from the track. Note that the details page displays information about the individual chains, similar to a standard chain track.
In this example, you will create your own bigChain file from an existing bigChain input file.
bigChain.as
and
bigLink.as
files to your machine
(Step 2).bedToBigBed
and hgLoadChain
utilities
(step 3).hg38.chrom.sizes
text file to your machine.
It contains the chrom.sizes for the human (hg38) assembly (step 4).bigChain.bb
and
bigChain.link.bb
) on a web-accessible server (step 8).If you would like to share your bigChain data track with a colleague, learn how to create a URL by looking at Example 11 on this page.
Since the bigChain files are an extension of bigBed files, which are indexed binary files, they can be difficult to extract data from. We have developed the following programs, all of which are available from the directory of binary utilities.
bigBedToBed
— this program converts a bigBed file to ASCII BED format.bigBedSummary
— this program extracts summary information from a bigBed
file.bigBedInfo
— this program prints out information about a bigBed file.As with all UCSC Genome Browser programs, simply type the program name at the command line with no parameters to see the usage statement.
If you encounter an error when you run the bedToBigBed
program, it may be because
your input bigChain file has data off the end of a chromosome. In this case, use the
bedClip
program here before the
bedToBigBed
program. It will remove the row(s) in your input BED file that are off the
end of a chromosome.