UCSC Genome Bioinformatics

Genome Browser in a Box User Guide

What is the Genome Browser in a Box?

The Genome Browser in a Box (GBiB) is a "virtual machine" of the entire UCSC Genome Browser website that is designed to run on most PCs (Windows, Mac OSX or Linux). The GBiB allows you to access much of the UCSC Genome Browser's functionality from the comfort of your own computer. The GBiB is primarily focused towards allowing users with protected data to use the Genome Browser's functionality. We expect that many of these users with protected data will be using the human genomes (hg19, and soon hg38). Because of this, the GBiB has been optimized for use with the hg19 assembly. Many of the more recent assemblies will work, but without mirroring additional data, they may be slow.

Differences between the GBiB and the Genome Browser?

While the GBiB and the Genome Browser are similar in many ways, there are several key differences. One of the key differences is the ability to visualize sensitive or protected data using the GBiB. Previously, to view your data using the Genome Browser, you would have to either upload your data to the UCSC Genome Browser website, or place your file on a publicly accessible web server and supply the URL to UCSC. With the GBiB, however, none of your data is uploaded to UCSC. This means that you can use GBiB to view your data even in situations where it's infeasible to load your data onto a public web server.

The GBiB does not include the entire UCSC genome annotation database of several terabytes, and instead depends upon remote connections to various UCSC servers for much of its functionality and data. It connects to our download server for the genomic sequences, LiftOver files, and many of the other large data files. It connects to UCSC's public MySQL server to download data displayed by the various tracks. Unfortunately, there are several tracks that are unavailable on our public MySQL server due to agreements with the data distributors, and thus unavailable in the GBiB. The tracks that are not available on our public MySQL server include the DECIPHER and LOVD Variant tracks. Depending on your distance to UCSC, remote access to these UCSC databases can be slow. Therefore, the GBiB includes a simple tool that allows downloading ("mirroring") of selected genome annotation tracks to your machine. You can find more information on this new tool in the Improving the Speed By Mirroring Tracks section.

Getting Started: Setting up the Genome Browser in a Box

The Genome Browser in a Box (GBiB) will run on most modern PCs and major operating systems. There are some basic requirements that your PC should meet to provide the best experience when using the GBiB:

  • A computer with support for virtualization, which is the case for most PCs sold after 2010.
  • VirtualBox software.
    • The software is free to use in many situations. See their licensing terms and conditions for details.
    • Install it by following the instructions on the website.
    • You must have administrator rights for your computer to install the VirtualBox software.
    • We have tested and confirmed that the GBiB works using versions 4.3.6 through 4.3.12.
  • At least 20 GB of free space on your hard disk (or more if you plan to mirror many tracks).
  • Your network firewall must allow connections on the following ports:
    • Port 3306 is used by MySQL. Without this, only mirrored tracks are shown.
    • TCP port 873 is used by rsync. Without this, you cannot download track data.

Installation

Once you have confirmed that your PC meets the above requirements, download the Genome Browser in a Box ZIP file. Because this file is several gigabytes, it may take anywhere from 30 minutes to a few hours to download depending on your Internet speed and distance from UCSC. Once you have finished downloading the GBiB ZIP file, unzip and extract the three files in it. On OSX, do not use the command line tool "unzip", but double-click double click the ZIP file in the Finder. The extracted files can be moved to a different directory, as long as all three files are located in the same directory and not renamed.

You can add the GBiB to VirtualBox by either double-clicking the file browserbox.vbox or by starting VirtualBox and selecting Machine >> Add (⌘+A on Macs or Ctrl-A on Linux/Windows) and opening the file browserbox.vbox.

Finally, if needed, select "browserbox" on the left side menu of VirtualBox and click the big "Start" button in the symbol bar.

If there are error messages displayed by VirtualBox or the Genome Browser, please consult the Troubleshooting section.

Using the Genome Browser in a Box

Make sure that you have set up your Genome Browser in a Box (GBiB) according to the instructions in the "Getting Started" section. Once you have your GBiB set up and powered on, point your Internet browser to 127.0.0.1:1234 (we suggest bookmarking this address). If you have correctly set up your GBiB, you should see the normal UCSC homepage. Once on the UCSC homepage, you can start using the GBiB as you would the main Genome Browser website. You can find an introduction to using the UCSC Genome Browser on our main help page.

Note: we recommend using 127.0.0.1:1234 as the URL to your GBiB instead of http://localhost:1234 because most Internet browsers do not send cookies to "http://localhost". Without cookies, your browser configuration may not be saved between sessions.

Improving Speed by Mirroring Tracks

As a first measure, you can try to increase the amount of RAM that Genome Browser in a Box is allowed to use. By default it is limited to 1GB but if your machine has enough RAM installed, you might want to increase it to 2GB, 4GB or even 8GB. Stop the box, click Settings >> System and increase the memory slider to a value of your choice. This will usually make the system more responsive.

If your Genome Browser in a Box (GBiB) is still slow, then you are probably located too far away from UCSC. The load time of the default tracks ranges from a few seconds on the West coast of the United States to up to 7 seconds from Europe. The GBiB includes the new mirror tool that allows you to download tracks to your machine, which will greatly increase the speed of the GBiB when accessing those tracks. To download annotation tracks to your own machine ("mirror"), click Tools >> Mirror Tracks in the Genome Browser menu. The first time you navigate to this mirror tool, the page may take a while to load. Select the tracks that you typically use by checking the boxes next to the track names, and clicking "Download". In addition to downloading entire track sets, you can also download individual subtracks. The size of the download for each track is listed next to the track name. When downloading large tracks, please keep in mind that you cannot delete these tracks and the related data from your GBiB once you have downloaded them. If you find that you've started downloading the wrong track, or a track that is too large for your machine, you can cancel the download at any point by clicking 'Cancel Download Now'.

If you are unsure what to select, our recommendation is "Default tracks with conservation tables, but no alignments". Depending on your network bandwidth, the download can take several minutes or up to a few hours over an ADSL line. During the download, the file browserbox-disk2.vmdk will grow. You cannot use the GBiB during this time. Once the download is complete, the default tracks should load in less than three seconds for a typical genomic position.

If you have mirrored all tracks that you might ever need and you want to make sure that GBiB does not connect to the internet anymore (e.g. if this is a corporate IT policy), you can type the command

boxOffline
in the VirtualBox terminal window. It will remove the network access to the UCSC download servers. If any datafile is not located on your local disk, the Genome Browser will then show an error message, so we do not recommend it for general use. You can always reactivate internet access with
boxOnline

Updating the Genome Browser in a Box

The software that supports the UCSC Genome Browser is updated every three weeks. These updates include new features and bug fixes for existing features. The track data, on the other hand, is not updated on a regular basis. New tracks and updates to existing tracks are released as they pass UCSC's quality assurance process. The only exceptions to this are GenBank tracks, including RefSeq Genes, GenBank mRNAs, and others. These GenBank tracks are updated every night through an automatic process.

The GBiB is set up to automatically update itself. When it updates itself, it will update all of the files in the GBiB, including software and tracks that you have mirrored. The updates will not affect your custom tracks, user accounts or sessions. This auto-update process ensures that you will always have the newest data and features. If you are using a DSL line, we recommend turning off these automatic updates. Over a slow internet connection, it could take hours for your GBiB to complete the update process, meaning that the GBiB would be unusable during that time.

You can turn off this auto-update process using the GBiB command line. Once you have started the GBiB, click on the "black" screen with the terminal and type the following command:

autoUpdateOff
While you can turn this auto-update process off, it is not recommended unless you are using the GBiB on a slow internet connection. Without this update process turned on, the GBiB software and tracks that you have mirrored will not be updated. This means that you will miss out any features and bug fixes, as well as updates to your mirrored tracks. If you would like to turn the auto-update process back on, you can type the following command in the GBiB terminal:
autoUpdateOn

In addition to the auto-update process, you can manually update the GBiB using the command line. To do so, click on the "black" screen with the terminal and type the following command:

updateBrowser
This will run the script that updates all of the GBiB software and annotation tracks that you have mirrored.

If, for some reason, your GBiB stops working, there are a few ways you can fix it. Your first option is to update the GBiB using the updateBrowser command. This method may only help if you had turned off the automatic updates at some point. This update process may take several hours over a slow internet connection. Your other option is to re-download the ZIP file and extract only the file browserbox-disk1.vmdk. Do not extract and overwrite browserbox-disk2.vmdk, as it contains you track and session settings and mirrored tracks.

Viewing Your Data

In addition to providing access to the standard set of Genome Browser annotation tracks, the GBiB allows users to upload their own annotation tracks. These custom annotation tracks can be viewed in the Genome Browser alongside the native UCSC tracks. Uploading your custom annotation tracks to the GBiB is similar to uploading your custom tracks to the main UCSC Genome Browser website. One big difference is the ability to load local big data files without the need to host them on a separate web server. More information on taking advantage of this functionality can be found in the Loading Local Big Data Tracks section. To start uploading your own tracks, navigate to the Add Custom Tracks page. For more information on uploading your own custom annotation tracks, please refer to our Custom Track help page. In addition to custom tracks, you can use Track Hubs to easily view your data in the GBiB. Track Hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser or GBiB. Track Hubs are preferred to custom tracks as hubs are more permanent. Custom tracks outside of saved sessions will expire and be removed after a few days. Track Hubs on the other hand, will persist until you delete the files.

Loading Local Big Data Tracks

Your computer can share directories with the GBiB so that you can load big files without the need to upload them to a web server. The big file formats are compressed, indexed, binary files, and include bigBed, bigWig, BAM, and VCF files. Normally, one would have to place these types files onto a publicly accessible web server to upload them as a custom track. However, the GBiB acts as its own web server, meaning that you can share local files with the GBiB for easy uploading as a custom track.

To allow the GBiB to access one or more of your local folders, follow these steps:

Step 1. Shutdown the GBiB virtual machine

  1. Close the black GBiB window.
  2. Select "Send the shutdown signal".
  3. Confirm by clicking "OK".

GBiB Power Off
GBiB Shared

Step 2. Allow VirtualBox access to one or more directories on your hard disk

  1. Click on the "browserbox" entry in VirtualBox, click on Settings.
  2. Click on Shared Folders.
  3. Click on the small + icon.
  4. Select a directory on your disk under Folder Path / Other.
  5. Select the checkbox to give access "Read-only", and make sure the checkbox for "Auto-mount" is selected.
  6. Confirm by clicking "OK".

You can repeat this with other folders, if needed. Then, restart the browserbox by clicking the big "Start" button again.

To check if your folders are shared, type this address into your web browser: http://127.0.0.1:1234/folders. It should show all shared folders. To obtain the bigDataUrl of any of the files in your shared folders, right-click on any file and select "Copy link address". You can now paste this URL into the 'Add Custom Tracks' page.

Example 1:

Here is a custom track that loads a local BAM file (you will have to replace the part after http:// with a pasted URL from your own machine):

track type=bam name=BamExample bigDataUrl=http://127.0.0.1:1234/folders/test/bamExample.bam

Data and Track Conversion Tools

The black terminal for the GBiB is a normal Linux command line. For easier use of the command line, you can connect to the box from your computer with SSH. Your computer's terminal likely supports copy/paste and is faster. To connect with SSH, open a terminal on your computer and type:

ssh browser@localhost -p 1235
You will be prompted for a password when attempting to access the GBiB from your computer's command line. The password is "browser". You can use sudo for root access, which does not require a password. Unfortunately, not every computer comes with SSH installed. If this is the case with your computer, you will need to use the GBiB terminal, or find another program that will allow you to SSH into the GBiB.

By default, the GBiB includes a few of the basic UCSC tools, such as bedToBigBed, wigToBigWig, samtools and tabix. These tools can be used to convert and manipulate your basic files into ones that can be uploaded to GBiB as custom tracks. If you need other Genome Browser tools, type the following command into the GBiB terminal window:

downloadTools
This command downloads and installs the full suite of command line tools provided by UCSC. Many of these extra tools can be used to extract data and other useful information from your files, or to convert them between various file types. A complete listing and description for all of these tools can be found on UCSC's download server. You can use these tools to convert and extract data from your shared files with the standard "Read-only" settings. However, if you would like to use these tools to modify files you've shared with your GBiB, you will have to ensure that the "Read-only" access for VirtualBox is turned off. To do so, follow the directions in Step 2 of the Loading Local Big Data Tracks section, except deselect the checkbox next to "Read-only".

Example 2:

To index a BAM file in a shared folder "Documents" on your hard disk, type:

cd /folders/Documents
samtools index my.sorted.bam

Example 3:

To convert a .bed file in a shared folder "Documents" on your hard disk to a .bigBed format file, type:

cd /folders/Documents
fetchChromSizes hg19 > hg19.sizes
bedToBigBed bedExample.txt hg19.sizes myBigBed.bb

Sharing your Genome Browser in a Box with Colleagues

By default, the GBiB can only be accessed from the machine that it is installed on. This is done so that others cannot access your data. You can, however, make your GBiB instance available for use by others. You can open external access to your GBiB by stopping it, selecting the browserbox machine in VirtualBox, clicking Settings and going to Network >> Adapter 1 >> Advanced >> Port Forwarding. Remove the address "127.0.0.1" from "Rule 1" by deleting it with the backspace key. In addition to enabling port forwarding for VirtualBox, you may need to enable the port forwarding functionality on your PC's firewall to allow others to access your GBiB. You will have to search online for instructions on how to enable this functionality on your PC's firewall.

Your colleagues can then access and use your GBiB instance by typing your IP address into their internet browser, followed by the :1234 port. Please keep in mind that once you have opened up your GBiB for remote access, anyone who knows your IP address will be able to access your GBiB and files that you have shared with it.

User Accounts and Sessions

When you start up your Genome Browser in a Box for the first time, you should create a user account so that you can start saving sessions. The Session tool allows you to configure your browser with specific track combinations, including custom tracks, and save the configuration options. You can then share these sessions with others who use your GBiB instance. More information on creating, saving, and sharing sessions can be found in the Sessions User's Guide. User accounts and sessions on the GBiB are separate from those at the UCSC Genome Browser. This means that user names and sessions that you create on your GBiB will not work with the main UCSC Genome Browser website, and vice versa.

Unfortunately, username recovery is not supported at this time. You can, however, recover lost passwords. The system for recovering lost passwords on the GBiB is much different than that on the Genome Browser, and requires access to the command line. Use the following steps to recover lost passwords:

  1. Navigate to the account login page.
  2. Click on the 'Can't access your account?' link on the login page.
  3. Select the 'I forgot my password. Send me a new one.' option and enter your username.
  4. An email will be sent to the Alpine email client included with the Box.
  5. To access the email client, click into the GBiB's black terminal window.
  6. In this window, type mail and press enter, which will bring up the Alpine email client.
  7. Select MESSAGE INDEX from the menu and press enter.
  8. Select the message with 'New temporary password...' in the subject line.
  9. Log in using your username and this temporary password.
  10. After logging in, you will be prompted to create a new password.
  11. Once you are finished, you can exit the Alpine email client by pressing 'Q', and then 'Y'.

Please be aware that anyone with access to your username and the command line interface of your GBiB can change your password.

Troubleshooting Common Problems

This section contains some common errors that you may encounter while setting up and installing the GBiB. This section is in no way a complete listing of all of the problems that you may encounter during the GBiB setup. If you run into issues not listed below, feel free to email the UCSC Genome Browser mailing list at genome@soe.ucsc.edu.

  • VirtualBox Error: VT-x/AMD-V hardware acceleration has been enabled, but is not operational. Your 64-bit guest will fail to detect a 64-bit CPU and will not be able to boot.
    • Some older entry-level laptops from around 2009-2011 (e.g. Toshiba Satellite U500) were sold with CPUs that do not support virtualization. These laptops cannot run the Genome Browser in a Box. The same applies to low-cost laptops called "netbooks" with Intel Atom processors.
    • On some DELL OptiPlex (e.g. OptiPlex 960) laptops, virtualization is supported, but deactivated in the BIOS. Reboot the computer, press F12 during boot to show the BIOS menu. Go to BIOS Setup >> Virtualization Support >> Virtualization, and check "Enable Intel Virtualization Technology". (On some DELL laptops, you may need to enable additional virtualization options under the BIOS Setup. Go to Virtualization Support >> Virtualization for Direct I/O, and check "Enable Virtualization for Direct I/O".) Exit and save. Restart the computer.
  • VirtualBox Error: "Failed to open virtual machine located in... Trying to open a VM config ... which has the same UUID as an existing virtual machine."
    • This happens if Genome Browser in a Box has already been downloaded and installed.
    • Start VirtualBox, select the old browserbox virtual machine and choose Machine >> Remove in the menu (Ctrl+R or ⌘+R). When asked, choose "Remove only" so you still keep the old browserbox on your disk.
    • Double-click the newly downloaded browserbox.vbox file or add it with Machine >> Add.
  • VirtualBox Error: "Failed to open virtual machine located in... Cannot register the hard disk ... because a hard disk ... already exists."
    • This happens if you have already downloaded and installed a Genome Browser in a Box before.
    • Start VirtualBox, go to File >> Virtual Media Manager in the menu (Ctrl+D or ⌘+D). Select browserbox-disk2.vmdk and click "Remove".
    • Double-click the newly downloaded browserbox.vbox file or add it with Machine >> Add.
  • Genome Browser Error: Couldn't connect to database hg19 on genome-mysql.cse.ucsc.edu as genomep.
    • This means that the virtual machine could not connect to the UCSC MySQL server.
    • It can be caused by a change of the IP address (e.g. on a Wi-Fi) that has not been picked up by the virtual machine yet. You can restart the box or run a command to reset the network like sudo ifup --force eth0.
    • This can also be caused by a firewall that does not allow outgoing TCP data on port 3306/MySQL. Please contact your institution's IT support staff to enquire about ways to open this port.

Licensing Information

To use the Genome Browser in a Box (GBiB), you must agree to the license. The GBiB is free for non-commercial use by non-profit organizations, academic institutions, and individuals. Corporate use requires a license, setup fee and annual payment.