Getting started with Cell Browser wrangling
UCSC VPN
Connecting to the UCSC VPN is not required for much of your day-to-day work. However, it can be required to access papers in popular journals (e.g. Science or Nature) that aren’t open access.
To begin installing the campus VPN you can visit information technology services (ITS) UCSC VPN Installation Instructions. Since our work computers are not managed by ITS you can simply find and download an installer that works with your operating system. Once that is installed, open it up and complete Step 2: Establish a VPN Connection instructions.
Useful Bookmarks
Below is a non-exhaustive list of sites that you should bookmark:
- https://cells-test.gi.ucsc.edu - Cell Browser development site
- https://cells-beta.gi.ucsc.edu - Cell Browser beta staging site
- https://cells.ucsc.edu - Cell Browser public/main site
- https://cellbrowser.readthedocs.io/en/master/ - Cell Browser public documentation
- https://github.com/ucscGenomeBrowser/cellBrowser - Cell Browser Github
- https://redmine.soe.ucsc.edu/projects/cellbrowser - Cell Browser Redmine
If you use your web browser’s bookmark bar, it can be helpful to use one to two character labels for your bookmarks as it allows you to squeeze more into that bar. For example, the cells-test bookmark could be labeled ‘T’, cells-beta as ‘B’, and so on.
Accounts you should have
To wrangle for the Cell Browser, you should have the following accounts:
- Github - you should be given ‘Write’ access to the cellbrowser-confs repo, and write access to the
- Redmine - you should be given access to the 'Cells' project, use this for tracking bugs, features, releases, etc.
Directories you should have
There are certain directories that every Cell Browser wrangler should have set up
- /hive/users/${hgwdev_username} - the /hive filesystem is where any operations done with large files should be done.
- Within this directory you should have:
- cb/ - a place where you can explore experimental cell browser datasets
- tmp/ or temp/ - a place where you can do temp file operations or things that you know you will most likely delete later
- Within this directory you should have:
- /cluster/home/${hgwdev_username}
- Within this directory you should have
- bin/ - where you can put scripts and other utilities so that they are picked up by your PATH
- public_html/ - you can put files here (including cell browsers) so that they are accessible via the web at https://hgwdev.gi.ucsc.edu/~${hgwdev_username}
- Within this directory you should have
Helpful papers to read
Here is an brief ‘App Note’ about the UCSC Cell Browser and an overview of its basic features:
- Speir MS, et al. UCSC Cell Browser: Visualize Your Single-Cell Data. Bioinformatics. 2021 Jul 9;37(23):4578-4580.
This paper provides a great introduction to the process of single-cell analysis and should give you an idea as to how the UCSC Cell Browser fits into that process:
- Luecken MD and Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol. 2019 Jun 19;15(6):e8746.
Conda
Importing data into the Cell Browser depends on two primary external tools: Seurat and Scanpy. The easiest way to install and manage these tools is using conda. Conda is a package management tool, similar to pip for python environments, however, you can install more than just python packages. You can use conda to install packages for R and more.
Step 1: Install miniconda
After you have logged into hgwdev, download the conda setup script to your home directory:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Then run the setup script:
bash Miniconda3-latest-Linux-x86_64.sh
Enter ‘y’ when prompted throughout the installation process. Full installation instructions can be found on their website.
At the end, run this to immediately activate conda (otherwise, it will automatically be activated next time you log out/in to hgwdev):
source ~/.bashrc
Step 2: Set up a conda envs for Seurat & Scanpy
We will set up separate environments for Scanpy and Seurat. Conda provides good documentation if you want to learn more about managing your conda environments.
2a Scanpy Environment
Creating the environment
First, we’ll set up an environment in which you will install scanpy:
conda create --name scanpyenv python=3.9
It will do some work to find the packages that it needs to install. When prompted, type “y” to finish the setup and install everything.
After creating the env, activate it so that the packages you install next will be installed in that environment:
conda activate scanpyenv
Mamba
Mamba is essentially a fast wrapper for conda and drastically improves the speed of installing and upgrading packages. Install this package first, then use it to install the others:
conda install -c conda-forge mamba
Scipy & Colors
The “scipy” package is needed for working with mtx files and “webcolors” is allows for custom colors. Install both with this single command:
mamba install -c conda-forge scipy webcolors
Scanpy
Scanpy has quite a few dependencies. First, we have to install some supporting packages:
mamba install seaborn scikit-learn statsmodels numba pytables
Then we can install scanpy and more supporting packages via the ‘conda-forge’ channel:
mamba install -c conda-forge python-igraph leidenalg louvain scanpy
2b Seurat Environment
Creating the environment
After you’ve installed that, set up an environment for cellbrowser+seurat:
conda create --name seuratenv python=3.9
It will do some work to find the packages that it needs to install. When prompted, type “y” to finish the setup and install everything.
After creating the env, activate it:
conda activate seuratenv
Mamba
Mamba is essentially a fast wrapper for conda and drastically improves the speed of installing and upgrading packages. Install this package first, then use it to install the others:
conda install -c conda-forge mamba
Scipy & Colors
The “scipy” package is needed for working with files in mtx format and “webcolors” is needed for custom colors. Install both with this single command:
mamba install -c conda-forge scipy webcolors
R, Seurat, SeuratObject, R.oo, and R.utils
Finally, we can install R, Seurat, and a few other supporting packages:
mamba install -c conda-forge r r-seurat r-seuratobject r-r.utils r-r.oo
Step 3: Make a copy of the Cell Browser Github
We will make a copy of the Cell Browser Github and set the default branch to be ‘develop’. This means that when you’re building cell browsers you’re always using the latest version of the tools. This also allows us to find and fix bugs in the Cell Browser command-line tools before they leak out to the pip release.
In your home directory, run the following command:
git clone https://github.com/ucscGenomeBrowser/cellBrowser.git
Next, check out the develop branch:
git checkout develop
After that, add the following lines to your .bashrc so that you automatically the right tools:
export PATH=$HOME/cellBrowser/src:$HOME/cellBrowser/ucsc
If you already have a ‘PATH’ line in your .bashrc, then just insert ‘$HOME/cellBrowser/src:$HOME/cellBrowser/ucsc’ at the very beginning of your PATH and separate it from the next item with a ‘:’.
Github SSH keys
On hgwdev, generate a new ssh key pair (substitute in your Github email):
ssh-keygen -t ed25519 -C "your_email@example.com"
You’ll be prompted for a file name and a passphrase, neither are required, so just hit ‘Enter’ both times to skip those steps.
Copy the public key from your terminal window:
cat ~/.ssh/id_ed25519.pub
Follow Github’s instructions from (2) onward to add this public key to your account.
Auto-updating the Github repo
You can set up a cron to keep this github repo up-to-date.
Open the crontab editor:
crontab -e
Add these lines to the top of your crontab:
SHELL=/bin/sh MAILTO={your @ucsc.edu email address}
Then these lines anywhere in your crontab before saving and exiting:
# cell browser git update 16 6 * * 1-5 cd ~/cellBrowser/; git pull
This will go into your cellBrowser directory and do a ‘git pull’ at 6:16 am Monday through Friday every week, regardless of the date or month. IBM has a pretty detailed manual page for cron and this crontab.guru site seems like it can help you visualize how changing those columns affects the cron schedule.
Configuration files in your home directory
Setting up a .cellbrowser.conf
This is a file that exists in your home directory and helps set some Cell Browser-wide configuration options. Here are some essential lines to have in this file along with an explanation of why that line is important:
# Tells your cbBuild where data root for the cell browser is # so that it can properly interpret and build the collection structure dataRoot = "/hive/data/inside/cells/datasets/" # Helps us with tracking site usage gaTag = "UA-132481597-1"
# Shortcuts for use with cbBuild outDirs = {"alpha" : "/usr/local/apache/htdocs-cells", "beta" : "/usr/local/apache/htdocs-cells-beta"} # Forces us to remember to fill out these tags reqTags=["body_parts"] # Forces us to remember that directories should be all lowercase onlyLower=True
Feel free to copy and paste these lines into the .cellbrowser.conf in your home directory.
Your .bashrc
This section covers settings to add to your ~/.bashrc that have been useful for others, though is not an exhaustive list. You may find it helpful to add your own shortcuts for commands, directories, and more as you wrangle data.
# Useful commands: alias ls='ls --color=auto' alias c='clear;pwd' alias p='pwd -P' # shows the "real" path in bash, not the path via symlinks # Useful directory shortcuts alias cells='cd /hive/data/inside/cells/datasets/' alias cb='cd ~/cellBrowser/' # This setting controls what’s shown at the beginning of your command prompt # This will display the host + your working directory, e.g. [mspeir@hgwdev asthma-lung] export PS1='[\u@\h \W]$ ' # There are many more customization options, see: https://www.cyberciti.biz/tips/howto-linux-unix-bash-shell-setup-prompt.html # Your PATH is where bash will look for commands you run and will go through them in the order specified in your PATH. In essence it allows you to run a program like wigToBigWig without having to spell out where that utility lives, /cluster/bin/x86_64/wigToBigWig export PATH=$HOME/cellBrowser/src:$HOME/cellBrowser/ucsc:/cluster/software/bin:/cluster/bin:/bin:/usr/bin:/cluster/bin/$MACHTYPE:/usr/local/bin:/cluster/bin/scripts:$HOME/bin/$MACHTYPE:$HOME/bin/:/cluster/bin/bedtools
Your .bash_profile
When you log onto hgwdev, the settings of your .bash_profile will automatically be loaded. These lines will ensure that the settings from your .bashrc are loaded at the same time:
if [ -f ~/.bashrc ]; then . ~/.bashrc fi
Setting up password-less login to hgwdev
On your own computer run:
ssh-keygen -t ed25519
Copy the public key from your terminal window:
cat ~/.ssh/id_ed25519.pub
Then log into dev via password and paste that key into ~/.ssh/authorized_keys
. (You may need to create this file if it doesn't exist.
Mosh (optional)
Installing and using Mosh is recommended, but optional. It allows you to leave long-running jobs going in a terminal window and not worry about processes being terminated if your computer goes to sleep or you change networks. Yes, you can do this with the Unix utility ‘screen’, but mosh simplifies the process greatly. With screen, your processes are kept running, but to access them again, you will need to log back into hgwdev and reconnect the screen, whereas with mosh, those windows remain connected, allowing you to get right back to what you were doing. Screen also behaves weirdly with conda envs (see this bug report).
See mosh’s installation instructions. If you’re on a Mac, it’s recommended to use homebrew instead of the ‘dmg’ as it should make future updates easier.
Once you’ve installed mosh, you will need to put these lines somewhere in your ~/.bashrc on both hgwdev and your own computer:
# mosh stuff export PATH=/cluster/software/bin:$PATH export LANG="en_US.UTF-8" export LC_COLLATE=C
Once you've done that, you can log onto hgwdev by just swapping 'ssh' with 'mosh' in your normal log in command:
mosh <username>@hgwdev.gi.ucsc.edu