Like most web servers, running a Genome Browser installation at your institution, even for your own department, requires a Unix machine, disk space (6TB for hg19), and the resources to update the site and underlying OS regularly. You may want to consider the following alternatives before embarking on a full UCSC Genome Browser installation directly on your server.
Embed the Genome Browser graphic in your web page: If you only want to include a Genome Browser view in your web page for an existing genome, you can use an <iframe> tag and point it to http://genome.ucsc.edu/cgi-bin/hgRenderTracks, which will show only the main browser graphic without any decorations.
You can then use various parameters to adapt this graphic to your use case, e.g., set the displayed position, switch tracks on/off or highlight a range with a color. See our manual pages for a list of the parameters.
Use assembly hubs: Assembly hubs allow you to prepare any FASTA file, add annotations and use the Genome Browser to visualize it. All you need is a webserver where you save the indexed genome sequence and files to annotate it, e.g. in BAM, bigBed or bigWig format.
Pros:
Cons:
Use Genome Browser in a Box (GBiB): Genome Browser in a Box is a fully configured virtual machine that includes Apache and Mysql, and has behavior identical to the UCSC website. GBiB loads genome data from the UCSC download servers on the fly. Website and data updates are applied automatically every two weeks. By default, GBiB uses the VirtualBox virtualization software, allowing it to run on any operating system, Windows, OSX and Linux. It does not require VirtualBox; the virtual machine image can easily be converted to e.g., VMWare or HyperV. For increased privacy, you can download the genomes and annotation tracks you need and use GBiB offline on a laptop.
Pros:
Cons:
If none of the above options fulfill your needs, consider setting up a full local mirror of the UCSC website. We support mirror site installations as time allows, and have many functional mirrors of the Genome Browser worldwide.
A license is required for commercial download and/or installation of the Genome Browser binaries and source code. No license is needed for academic, nonprofit, and personal use. To purchase a license, see our license instructions or visit the Genome Browser store.
If you do not want to use our prepared virtual machine Genome Browser in a Box, we provide a Genome Browser in the Cloud (GBiC) installation program that sets up a fully functional mirror on all major Linux distributions. It has been tested on Ubuntu 14/16, RedHat/CentOS 6 and 7, and Fedora 20. Preferably, the installation should be performed on a fresh Linux installation, as it deactivates the default site config file in Apache and fills the MySQL directory with numerous databases. The easiest way to accomplish this is to run the GBiC program in a new virtual machine. The program also works on Docker and cloud computing virtual machines, and has been tested on those sold by Amazon, Microsoft and Google.
Like GBiB, the mirror installed by GBiC can load the annotation data from either UCSC or a local database copy. If you load data from UCSC and use a cloud computing provider, it is highly advisable to run your instances in the US West Coast / San Francisco Bay Area or San Jose data centers; otherwise, data-loading may be very slow.
To run the installation program, please see the GBiC user's guide.
If the installation program does not work on your Linux distribution or you prefer to make adaptations to your mirror, we also provide step-by-step installation instructions that cover the configuration of Apache, MySQL, the Genome Browser CGIs, temporary file removal data-loading through proxies, and other topics.
The following external websites were not created by UCSC staff and are of varying quality, but may be helpful when installing on unusual platforms:
UDR (UDT Enabled Rsync) is a
download protocol that is very efficent at sending large amounts of data over long distances. UDR
utilizes rsync as the transport mechanism, but sends the data over the UDT protocol. UDR is not
written or managed by UCSC. It is an open source tool created by the
Laboratory for Advanced Computing at the
University of Chicago. It has been tested under Linux, FreeBSD and Mac OSX, but may work under other
UNIX variants. The source code can be obtained through GitHub. When using the GBiC installation program, the -u
option
will use UDR for all downloads.
If you manually download data only occasionally, there is no need to change your method; continue to visit our download server to download the files you need. This new protocol has been put in place primarily to facilitate quick downloads of huge amounts of data over long distances.
With typical TCP-based protocols like http, ftp and rsync, the transfer speed slows as the distance between the download source and destination increases. Protocols like UDT/UDR allow for many UDP packets to be sent in batch, thus allowing for much higher transmission speeds over long distances. UDR will be especially useful for users who are downloading from locations distant to California. The East coast of the U.S. and the international community will likely see much higher download speeds when using UDR vs. rsync, http or ftp.
If you need help building the UDR binaries or have questions about how UDR functions, please read the documentation on the GitHub page and if necessary, contact the UDR authors via the GitHub page. We recommend reading the documentation on the UDR GitHub page to better understand how UDR works. UDR is written in C++. It is Open Source and is released under the Apache 2.0 License. In order for it to work, you must have rsync installed on your system.
For your convenience, we offer a binary distribution of UDR for Red Hat Enterprise Linux 6.x (or variants such as CentOS 6 or Scientific Linux 6). You will find both a 64-bit and 32-bit rpm here.
Once you have a working UDR binary, either by building one from source or installing the rpm, you can download files from either of our our download servers in a fashion similar to rsync. For example, using rsync, all of the MySQL tables for the hg19 database can be downloaded using the following command:
rsync -avP rsync://hgdownload.cse.ucsc.edu/mysql/hg19/ /my/local/hg19/
rsync -avP hgdownload.cse.ucsc.edu::mysql/hg19/ /my/local/hg19/
With UDR, the syntax for downloading the same data is:
udr rsync -avP hgdownload.cse.ucsc.edu::mysql/hg19/ /my/local/hg19/
If you installed the rpm, use the man udr
command to view a man page with more
information. If you installed from source, please refer to the UDR GitHub page for more details on
the capabilities of UDR and how to use it.
UDR establishes connections on TCP/9000, then transmits the data stream over UDP/9000-9100. Your institution may need to modify its firewall rules to allow inbound and outbound ports TCP/9000 and UDP/9000-9100 from either of the two download machines.
If you have difficulties installing or using UDR on your system, contact the Laboratory for Advanced Computing through their GitHub page.
For questions about installing and mirroring the UCSC Genome Browser, contact the UCSC mailing list genome-mirror@soe.ucsc.edu. Messages sent to this address will be posted to the moderated genome-mirror mailing list, which is archived on a SEARCHABLE PUBLIC Google Groups forum.