KnownGene build: Difference between revisions

From Genecats
Jump to navigationJump to search
No edit summary
No edit summary
Line 94: Line 94:


== Redmine ticket files and tables ==
== Redmine ticket files and tables ==
== Post release push "other species" blast tables ==

Revision as of 18:27, 1 September 2021

Build UniProt and Protein databases

I haven't been doing this recently. We need to look into whether the work Max has done with uniprot should replace this.

Edit trackDb to add new trackDb

 include knownGene.ra beta,public
 include knownGene.alpha.ra alpha


Initialize work directory

  • Create and cd into work directory of the form /hive/data/genomes/$db/bed/gencode$GENCODE_VERSION/build
  • Start a screen.
 screen -S knownGeneV38
  • Set PATH to include $HOME/kent/src/hg/utils/otto/knownGene
 PATH=$HOME/kent/src/hg/utils/otto/knownGene":$PATH"
  • Copy buildEnv.sh from previous build on this db
  cp /hive/data/genomes/mm39/bed/gencodeVM27/build/buildEnv.sh  buildEnv.sh
  • Find Table and File list from previous build
  cp /hive/data/genomes/mm39/bed/gencodeVM27/build/VM23.files.txt  ${GENCODE_VERSION}.files.txt
  cp /hive/data/genomes/mm39/bed/gencodeVM27/build/VM23.tables.txt  ${GENCODE_VERSION}.tables.txt
  • Confirm existing assembly tables are in a knownGene* database


Setting environment variables

The environment variables used in the build are set in the script buildEnv.sh. All the other scripts assume that this script has been sourced in the current shell. You have to edit this by hand. Most of the variables don't change. The hairiest ones are the other assemblies for the blast tables.

Running the build

To run the build execute hg/utils/otto/knownGene/buildKnown.sh. It builds into the knownGene${GENCODE_VERSION} database. It does the following steps:

  • Extracting Gencode data
  • Building initial knownGene table
  • Adding primary reference tables
  • Building final knownGene core tables
  • Building bigGenePred
  • Building GTF file

Copying over tables

Adding trackDb entry

Look for the previous trackDb.ra file, normally hg/makeDb/trackDb/<org>/<assembly>/knownGene.ra.

Adding IsPcr server

After building /gbdb/$db/targetDb/${db}KgSeq${curVer}.2bit, which happens in the buildCore.sh script run at the beginning of the process, ask cluster-admin to start an untranslated, -stepSize=5 gfServer on /gbdb/$db/targetDb/${db}KgSeq${curVer}.2bit

 to cluster-admin
 Hey my friends,
 
 Could you please start an untranslated -stepSize=5 production gfserver
 with this 2bit file?
 
 hgwdev:/gbdb/mm39/targetDb/mm39KgSeq13.2bit
 
 thanks!
 brian


On hgwdev, insert new records into blatServers and targetDb, using the host (field 2) and port (field 3) specified by cluster-admin. Identify the blatServer by the keyword "$db"Kg with the version number appended

cluster-admin will say something like this:

 Starting untrans gfServer for mm39KgSeq13 on host blat1b port 17921

Add this info to blatServers and targetDb tables in hgcentral.

  hgsql hgcentraltest -e \
     'INSERT into blatServers values ("mm39KgSeq13", "blat1c", 17921, 0, 1,"");'
  hgsql hgcentraltest -e \
           'INSERT into targetDb values("mm39KgSeq13", "GENCODE Genes", \
                    "mm39", "kgTargetAli", "", "", \
                             "/gbdb/mm39/targetDb/mm39KgSeq13.2bit", 1, now(), "");'

all.joiner changes

I haven't added anything to this recently.

The relevant id's are :

knownGeneId

joinerCheck all.joiner -identifier=knownGeneId -keys -database=mm39

Bundle up logs and check them in

Redmine ticket files and tables

Post release push "other species" blast tables