Cell Browser filters
The Cell Browser utilizes a set of filters to allow people to narrow down the dataset list to only those of interest:
The values in these filters are determined based on tags in a dataset's cellbrowser.conf:
body_parts
diseases
projects
organisms
sources
life_stages
domains
This page will walk you through the process of curating these tag values for a single dataset. You can combine some of the steps here regarding BLAH with the information on the Managing_cellbrowser.conf_tag_values_for_multiple_datasets page to update the values for filter tags for many datasets.
Tag/value conventions
This sections covers our internal conventions for each set of tag/value pairs.
body_parts
For us at the Cell Browser, this tag is required for every dataset. (As determined by the 'reqTags' in your ~/.cellbrowser.conf
Values in this field are always lower case.
If you have a super high-level value, it's good to have a lower-level one as well. e.g. if you include 'brain' then you should also include a more specific brain region like 'cortex', 'hippocampus', etc.
diseases
If data is only from a non-diseased sample, use the value 'Healthy'.
Since the Cell Browser displays data at the cellular level, we need to describe the disease state of the specimen from which data was generated from. Some examples:
- Donor has Type 2 Diabetes but donated skin for research. Skin is sequenced, then
diseases = ["Healthy"]
, despite the donor having diabetes. Donor-level health can be mentioned in the abstract or methods section.
- Donor has Pulmonary Fibrosis and donates lung tissue. The lung is sequenced, so then
diseases = ["Pulmonary Fibrosis"]
since the specimen that is sequenced was not in a healthy state.
- Sometimes samples are taken from adjacent tissue next to a tumor. If the dataset includes the tumor sample and the adjacent tissue, then
diseases = ["<disease/cancer type here>","Healthy"]
. You can also have multiple diseases affiliated with a dataset.
If data covers a disease, look it up in MONDO disease ontology to ensure that we're using a common label for all datasets of that disease. However, if a disease is not listed under MONDO try Human Phenotype Ontology (HP). If this disease dataset also includes healthy samples, then include the value 'Healthy Control'.
The distinction between 'Healthy' and 'Healthy Control' allows people who want to see only healthy datasets to see those and not clutter the list with disease datasets. (Often the healthy control samples are mixed in with the disease samples and separating them out is non-trivial.)
projects
This setting is used to group dataset collections together for a particular project. Sometimes projects have the same funding agency or they are under the same initiative/grant. Good places to check if the data is associated with a project are in the acknowledgments and data availability sections of the papers. They are also sometimes mentioned in press articles and/or Twitter announcements.
Common projects that we have worked with:
- Human Cell Atlas (HCA)
- California Institute of Regenarative Medicine (CIRM)
- Tabula Muris Consortium
- GTEx
- Allen Brain Atlas
- GSA for Human
- Mouse Cell Atlas
- The Alexandria Project
- Fly Cell Atlas
- EvoCell
organisms
List all species included in the dataset (or subdatasets).
For vertebrate species use the form: Common name (G. species) e.g. Human (H. sapiens) Mouse (M. musculus)
For non-vertebrates, use the form: G. species e.g. C. robusta
life_stages
Human | embryonic stage | 0-8 weeks |
---|---|---|
fetal stage | week 9-until birth | |
newborn stage | 0-1 month | |
infant stage | 1-24 months | |
child stage | 2-12 years old | |
adolescent stage | 13-18 years old | |
adult stage | 19+ years | |
Mouse | ||
embryonic stage | 1-15 days | |
fetal stage | day 15-until birth | |
early immature stage | 1-7 days | |
infant stage | 1-5 weeks | |
adolescent stage | 6 weeks-2 months | |
adult stage | 2+ months | |
Drosophila melanogaster | ||
embryo stage | 0-20 hours | |
larva stage | 1-3 days | |
pupa stage | 4-8 days | |
adult stage | 9+ days |
domains
Please specify these domains if any of the following applies to your dataset.
- Development
- - Dataset uses donors from multiple life stages
- - Mentions ‘development’ in the description
- - Looks at samples across a timecourse
- - Organoid growth experiments
- - Fetal/embryonic datasets
- Aging
- - Age is a major component of the study (Adult Pancreas, Aging Brain, Aging Human Skin)
- - Timecourse component (Tabula Muris, Tabula Muris Senis)
- Neurodegeneration
- - Multiple sclerosis (MS), Parkinson's disease, Alzheimer's disease
- Cancer
- - Mention of cancer in the project description
- - Tumor samples
- Atlas
- - Keywords in the description of the project like “Atlas” and “Landscape”
- - Multi organ (Tabula Sapiens)
- - Multi organ_parts (Immune Cell Atlas, Heart Cell Atlas)
- Disease Model
- - Samples compared to a healthy control (Lung in Pulmonary Fibrosis vs Control, Mouse Skin Stretch Response, Mouse DRG Injury)
- COVID-19
- - Mention of COVID-19
- Stem Cells
- - Mention of organoids
- - Mention of stem cells
- - Mention of stem cell lines (H1, H9, etc.)
- - Mention of induced pluripotent stem cells (aka iPSC or iPS)
- Evolution
- - Sometimes include species other than Mouse and Human
- - Looking at heritable changes over time
- - Mention evolution in the description, paper, and grant (EvoCell Project)
- Survey
- - Focus on a large number of cells and their connections (Mouse Nervous System)
- - Mention of “survey” in description and/or paper (Gut Cell Survey)
- - Keywords like “Heterogeneity” (Mouse Oligodendrocyte Heterogeneity)
sources
Where you got the data from.