About

The CIRM Data Warehouse

The stem cell hub is a data warehouse for stem cell genomic files. It houses primary data files such as DNA reads in fastq format, as well as many types of files derived from mapping and other analysis of the primary data, and PDF and other document files describing protocols. It has a small but flexible system for associating metadata tags with a file. Any CIRM-genomic associated lab can submit data. Once submitted data is treated as prepublication human sequence data, and access is only allowed to authorized users.

Data wranglers

The data wranglers assist labs with bringing data into the CIRM Data Warehouse and with downloading submitted data.

Clay Fischer <clmfisch@ucsc.edu>
Parisa Nejad <pnejad@ucsc.edu>
Nick Fong <nfong@ucsc.edu>

Engineers
You can contact the engineers with information about data pipelines and submission summary pages.

Chris Eisenhart <ceisenha@ucsc.edu>


Privacy

Data is private between labs

Currently, most data is restricted. This means the data is only accessible by members of the lab which submitted it. If you wish to view restricted data for your lab, contact a wrangler to be granted access.

Data is private from the public

Currently, data is protected from the public with use of a VPN. This requires setting up an account with our system administrators (cluster-admin@soe.ucsc.edu).

An SSL certificate is in the process of being set up, adding HTTPS encryption to your cirm-01 account. At That point we will remove the VPN requirement. When tumor maps are integrated, they will use your account authentication as well if dealing with restricted data.


Quickstart

Login or create an account

Login to the website. If you don't have an account, you can make one.

Browse files

At the top navigation, click the browse button and go to Files:

Top navigation should be displayed

This will bring you to a list of files.

file browser should be displayed

Filter results

If have not used filters, the filters will be blank, as shown above. You can edit the cells and press enter to filter your results.

When your cursor is in the cell, hit the down arrow on your keyboard to show a drop-down list of every item you can filter to.

file browser should be displayed

Filters take advantage of basic UNIX wildcard syntax. It looks for an exact match unless asterisks are used:

file browser should be displayed

Account (CDW website)

The top navigation bar will have the option to login:

login screen should be displayed

If you do not currently have an account, at the bottom you will see an option to create an account.

account creation screen should be displayed

To view restricted data, contact a data wrangler and they will be able to assist with granting access to your data.

Account (VPN)

Data is restricted from the public with the use of a VPN, currently.

It is assumed you have VPN access already if you are reading this, but you may need to set it up for other members in your lab.

Contact a data wrangler and they can coordinate a time with you and our system administators to set up your VPN account over the phone.

You will need to install Tunnelblick before your call.

Account (Server)

We no longer grant command line access to our server and are in the process of removing accounts.

Account (Tumor Map)

This feature is currently being implemented. It will share authentication with your CDW website account.

Finding data

Files

Under ‘browse’ go to Files. Click into the text box of a column and hit the drop-down. This shows everything for that column which you have the authorization to view. You can filter the columns down with the select list, or typing what you want with a wildcard. For example: Under sample_label, try *liver* and you’ll get any data sets with sample_labels.

Tracks

Tags

Data sets

This contains descriptions of the data sets submitted, and summaries of the submission including vizualizations.

Query

http://cirm-01/cgi-bin/cdwWebBrowse?cdwCommand=query

Downloading

Note to labs: your data may only be obtained from the CIRM Data Warehouse website. It cannot be downloaded until complete sequence data and metadata are available.

One file

Batch downloads


Definitions

Help submitting data


Deprecated

This being a prototype, there's not much help available. Try clicking and hovering over the Browse link on the top bar to expose a menu. The trickiest part of the system is the query link.The query link has you type in a SQL-like query. Try 'select * from files where accession' to get all metadata tags from files that have passed basic format validations. Instead of '*' you could use a comma separated list of tag names. Instead of 'accession' you could put in a boolean expression involving field names and constants. String constants need to be surrounded by quotes - either single or double.