QAing UCSC Genes: Difference between revisions
Line 509: | Line 509: | ||
*'''foldUtr3''' - 3' info in "mRNA Secondary Structure of 3' and 5' UTRs" section of hgGene | *'''foldUtr3''' - 3' info in "mRNA Secondary Structure of 3' and 5' UTRs" section of hgGene | ||
*'''foldUtr5''' - 5' info in "mRNA Secondary Structure of 3' and 5' UTRs" section of hgGene | *'''foldUtr5''' - 5' info in "mRNA Secondary Structure of 3' and 5' UTRs" section of hgGene | ||
*'''gnfAtlas2''' - | *'''gnfAtlas2''' - separate track, QA'd with that track but also determines the "Microarray expression Data" section of hgGene and the Gene Sorter column, " | ||
*'''gnfAtlas2Distance''' - Gene Sorter column "GNF Atlas 2 Delta" & "Expression (GNF Atlas2)" "sort by" option | *'''gnfAtlas2Distance''' - Gene Sorter column "GNF Atlas 2 Delta" & "Expression (GNF Atlas2)" "sort by" option | ||
*'''humanHprdP2P''' - Gene Sorter column "HPRD P2P" & "sort by" | *'''humanHprdP2P''' - Gene Sorter column "HPRD P2P" & "sort by" |
Revision as of 19:10, 30 November 2009
hgGene Page Source Information:
Click on the following link to view a sample hgGene page annotated with the sources of the different components:
File:Hg19uc002ypa.2.pdf
Gene Sorter Column Sources:
Name |
Description |
Source |
# |
Item Number in Displayed List/Select Gene |
n/a |
Name |
Gene Name/Select Gene |
kgXref.geneSymbol |
UCSC ID |
UCSC Transcript ID |
knownGene.name |
UniProtKB |
UniProtKB Protein Display ID |
kgXref.spDisplayID or kgXref.spID_organism |
UniProtKB Acc |
UniProtKB Protein Accession |
kgXref.spID |
RefSeq |
NCBI RefSeq Gene Accession |
kgXref.refseq |
Entrez Gene |
NCBI Entrez Gene/LocusLink ID |
knownToLocusLink |
GenBank |
GenBank mRNA Accession |
kgXref.refseq or kgXref.mRNA |
Ensembl |
Ensembl Transcript ID |
knownToEnsembl |
GNF Atlas 2 ID |
ID of Associated GNF Atlas 2 Expression Data |
knownToGnfAtlas2 |
Gene Category |
High Level Gene Category - Coding, Antisense, etc. |
kgTxInfo.category |
CDS Score |
Coding potential score from txCdsPredict |
kgTxInfo.cdsScore |
VisiGene |
UCSC VisiGene In Situ Image Browser |
knownToVisiGene |
Allen Brain |
Allen Brain Atlas In Situ Images of Adult Mouse Brains |
knownToAllenBrain & allenBrainUrl |
U133 ID |
ID of Associated Affymetrix U133 Expression Data |
knownToU133 |
GNF Atlas 2 |
GNF Expression Atlas 2 Data from U133A and GNF1H Chips |
gnfAtlas2 |
Max GNF Atlas 2 |
Maximum Expression Value of GNF Expression Atlas 2 |
calculated? |
GNF Atlas 2 Delta |
Normalized Difference in GNF Expression Atlas 2 from Selected Gene |
gnfAtlas2Distance |
BLASTP |
NCBI BLASTP Bit Score |
knownBlastTab.bitScore |
BLASTP |
NCBI BLASTP E-Value |
knownBlastTab.evalue |
%ID |
NCBI BLASTP Percent Identity |
knownBlastTab.identity |
5' UTR Fold |
5' UTR Fold Energy (Estimated kcal/mol) |
foldUtr5.energy |
3' UTR Fold |
3' UTR Fold Energy (Estimated kcal/mol) |
foldUtr3.energy |
Exon Count |
Number of Exons (Including Non-Coding) |
knownGene.exonCount |
Intron Size |
Size of biggest (or optionally smallest) intron |
knownGene exonStarts - exonEnds |
Genome Position |
Genome Position/Link to Genome Browser |
(knownGene.txStart + txEnd)/2 |
Mouse |
Mouse Ortholog (Best Blastp Hit to UCSC Known Genes) |
mmBlastTab |
Rat |
Rat Ortholog (Best Blastp Hit to UCSC Known Genes) |
rnBlastTab |
Zebrafish |
Danio rerio Ortholog (Best Blastp Hit to Ensembl) |
drBlastTab |
Drosophila |
D. melanogaster Ortholog (Best Blastp Hit to FlyBase Proteins) |
dmBlastTab |
C. elegans |
C. elegans Ortholog (Best Blastp Hit to WormPep) |
ceBlastTab |
Yeast |
Saccharomyces cerevisiae Ortholog (Best Blastp Hit to RefSeq) |
scBlastTab |
Pfam Domains |
Protein Family Domain Structure |
knownToPfam à pfamDesc |
Superfamily |
Protein Superfamily Assignments |
ucscScop & scopDesc |
PDB |
Protein Data Bank |
kgProtMap2 & sp###### database |
Gene Ontology |
Gene Ontology (GO) Terms Associated with Gene |
kgProtMap2 & sp###### database |
M. Vidal P2P |
Human Protein-Protein Interaction Network from Marc Vidal |
humanVidalP2P |
E. Wanker P2P |
Human Protein-Protein Interaction Network from Erich Wanker |
humanWankerP2P |
HPRD P2P |
Human Protein-Protein Interaction Network from the Human Reference Protein Database |
humanHprdP2P |
Description |
Short Description Line/Link to Details Page |
kgXref.description |
Table Descriptions
Attempt to describe the uses of the tables used in or related to UCSC Genes.
UCSC Gene and GS Table Descriptions
- allenBrainGene - "Human Cortex Gene Expression" link in "Sequence & Links to Tools & Databases" section of hgGene
- allenBrainUrl - w/ knownToAllenBrain creates GS column, "Allen Brain"
- bioCycMapDesc - BioCyc description name in "Biochem & Signaling Pathways" section of hgGene
- bioCycPathway - BioCyc pathway name in "Biochem & Signaling Pathways" section of hgGene
- ccdsKgMap - CCDS in the "Other names for this Gene" section of hgGene
- ceBlastTab - C. elegans info in "Orthologous Genes in Other Species" section of hgGene
- cgapAlias - links cgapID with kgXref.geneSymbol to pull info for gene
- cgapBiocDesc - BioCarta description in "Biochem & Signaling Pathways" section of hgGene
- cgapBiocPathway - BioCarta pathway name in "Biochem & Signaling Pathways" section of hgGene
- dmBlastTab - D. melanogaster info in "Orthologous Genes in Other Species" section of hgGene
- drBlastTab - zebrafish info in "Orthologous Genes in Other Species" section of hgGene
- foldUtr3 - 3' info in "mRNA Secondary Structure of 3' and 5' UTRs" section of hgGene
- foldUtr5 - 5' info in "mRNA Secondary Structure of 3' and 5' UTRs" section of hgGene
- gnfAtlas2 - separate track, QA'd with that track but also determines the "Microarray expression Data" section of hgGene and the Gene Sorter column, "
- gnfAtlas2Distance - Gene Sorter column "GNF Atlas 2 Delta" & "Expression (GNF Atlas2)" "sort by" option
- humanHprdP2P - Gene Sorter column "HPRD P2P" & "sort by"
- humanVidalP2P - Gene Sorter column "M. Vidal Protein-to-Protein" & "sort by"
- humanWankerP2P - Gene Sorter column "E. Wanker Protein-to-Protein" & "sort by"
- keggMapDesc - KEGG pathway description in "Biochem & Signaling Pathways" section of hgGene
- keggPathway - KEGG pathway name in "Biochem & Signaling Pathways" section of hgGene
- kg4ToKg5 - allows searching of an old ID from previous gene set in new gene set or users can check the kg3ToKg4 table directly to find corresponding gene IDs.
- kgAlias - "Alternate Gene Symbols" in "Other Names for This Gene" section of hgGene
- kgColor - colors the gene in browser
- kgProtAlias - intermediate table?
- kgProtMap2 - Scop Domains in "Protein Domain & Structure Information" section of hgGene & Protein Data Bank column in GS need this table to work properly; also involved with proteome browser (not releasing with proteome browser with hg19; being phased out)
- kgSpAlias - duplicate of kgAlias w/ extra field, spID, that is blank in all records
- kgTxInfo - provides table info in the "Gene Model Information" section of hgGene
- kgXref - provides the "Alternate Gene Symbols" in the "Other Names for This Gene" section of hgGene
- knownAlt - creates a separate track, "Alt Events"; needs to be QA'd separately
- knownBlastTab - Gene Sorter columns: GS "ID%"=knownBlastTab.identity, GS"BLASTP E-Value"=knownBlastTab.eValue, GS "BLASTP Bits"=knownBlastTab.bitScore)
- knownCanonical - best transcript from each clusterId (note, GS only works with genes in this table)
- knownGene - primary table
- knownGeneMrna - "mRNA" link in "Sequence & Links to Tools &Databases" section of hgGene
- knownGenePep - "protein" link in "Sequence & Links to Tools &Databases" section of hgGene
- knownIsoforms - groups transcripts into clusters named by clusterId
- knownToAllenBrain - w/ allenBrainUrl creates Gene Sorter "Allen Brain" column/link
- knownToCdsSnp - Dropped because due to too many bugs with table; enabled the Coding SNP column in gene sorter
- knownToEnsembl - used in link to Ensembl
- knownToGnf1h - dropped & didn't see changes on hgGene or Gene Sorter, gnfAtlas1?
- knownToGnfAtlas2 - "Microarray..." sxn & Microarray link, GS "GNF Atlas 2 ID"
- knownToHprd - creates the "HPRD" link in the Seq&lnks to Tls&Dbs section
- knownToLocusLink - used in link to Entrez Gene, see issues below
- knownToPfam - gives Pfam Domains section of Prot Dom & Stre info & GS
- knownToRefSeq - used in link to RefSeq (Other Names)
- knownToSuper - contains scop domain info with gene name & start/end
- knownToTreefam - used for link to Treefam website in Seq&lnks to Tls&Dbs
- knownToU133 - Gene Sorter column "U133 ID"
- knownToVisiGene - used in link to VisiGene
- mmBlastTab - mouse info in "Orthologous Genes in Other Species" section of hgGene
- pfamDesc - gives Pfam description in pfam domains section (step in GS)
- rnBlastTab - rat info in "Orthologous Genes in Other Species" section of hgGene
- scBlastTab - S. cerevisiae info in "Orthologous Genes in Other Species" section of hgGene
- scopDesc - prints acc and description in "SCOP Domains" of Prot Dmn * Strtr Info
- spMrna - intermediate table? Doesn't seem to directly affect hgGene or GS
- ucscScop - from ucscID gets scop domainName
Tables Related to UCSC Genes are Separate tracks
- affyU133
- allenBrainAli
- exoniphy - created by Adam Siepel of Cornell for each assembly (2nd choice is to lift from previous assembly)
- gnfAtlas2
- nibbImageProbes
- omimGene
- omimGeneMap
- omimMorbidMap
- omimToKnownCanonical
- vgAllProbes
No longer UCSC Genes Tables
- knownToCdsSnp - dropping on all assemblies. Found too many issues; Populated Cds Snp column in Gene Sorter.
- knownToGnf1h - part of GNF Atlas 1, which is not on hg19
Proteome Browser Tables (no longer releasing)
- pbAnomLimit
- pbResAvgStd
- pepCCntDist
- pepExonCntDist
- pepHydroDist
- pepIPCntDist
- pepMolWtDist
- pepPi
- pepPiDist
- pepResDist
- pepMwAa