New Assembly Release Process Details: Difference between revisions

Revision as of 18:20, 9 April 2010

Go to the Push Checklist

Stage and test on hgwbeta

Rsync database from hgwdev to hgwbeta

hgwbeta > hgsql

mysql > CREATE DATABASE ;

hgwbeta > hgsql -Ne "SELECT tbls FROM WHERE dbs=" qapushq > tableList

awk '{ for (i=1;i<=NF;i++) print $i }' infileName > outfileName

hgwdev > bigPush.csh tableList

Update hgcentralbeta: dbDb, blatServers, genomeClade, gdbPdb (for KnownGenes), liftOverChain

hgwdev > set db = "droVir1"

hgwdev > hgsql -N -e 'SELECT *FROM dbDb WHERE name LIKE "'$db'%"' hgcentraltest > dbDb.dev

hgwdev > hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'dbDb.dev' INTO TABLE dbDb" hgcentralbeta

hgsql -Ne "SELECT * FROM blatServers WHERE db = 'danRer6'" hgcentraltest > blat.dev

hgsql -h mysqlbeta -Ne "SELECT * FROM blatServers WHERE db = 'danRer6'" hgcentralbeta

hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'blat.dev' INTO TABLE blatServers" hgcentralbeta

hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = 'danRer6' OR toDb = 'danRer6'" \

hgcentraltest > chain.dev

hgsql -h mysqlbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = 'danRer6' OR toDb = 'danRer6'" hgcentralbeta

hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta

Push all of /gbdb/$db including html/description.html

add the new assembly to the mirror exclude list (to the gbdb and mysql rsync download targets) at hgdownload:/opt/csw/etc/rsyncd.conf and
push /gbdb/<new_db>/* from hgwdev to hgnfs1.

Generate trackDb in strict mode

hgwbeta> cd kent/src/hg/makeDb/trackDb

hgwbeta> make strict DBS=<new_db>

The pesky image file

~/browser/images/<image_name>

hgwdev:/usr/local/apache/htdocs/images/<image_name>

GenBank updates

add new assembly to list here:

hgwdev:~/kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs

and commit the change

ssh hgwbeta

cd /genbank/etc

cvs up -dP

hgwdev> updateTimes.csh <new_db> gbLoaded

Check the .nib files

File exists and is of reasonable size (not "0")
Data is real, not a symlink for hgnfs1 (hgwbeta, Round Robin). (OK if hgwdev is a symlink)
If nib, should be one per chrom in nib subdirectory
If 2bit, should be one file and not in any subdirectory

Check default position and default tracks
Review orderKey values

hgwbeta>hgsql hgcentralbeta

mysql>select name, orderKey from dbDb order by orderKey;

Check all sample queries on hgGateway page
Run joinerCheck

hgwbeta> cd ~/kent/src/hg/makeDb/schema

hgwbeta> joinerCheck -database=<new_db> -keys all.joiner

for whole DB, single key: -identifier="actual_key"
for single track: run Bob's script from hgwdev: runjoiner.csh <database> b)Check that all tables in this database are mentioned in all.joiner:
hgwbeta> cd ~/kent/src/hg/makeDb/schema

hgwbeta>joinerCheck -database=<new_db> -tableCoverage all.joiner
Check indices (from pushQ)

mysql> show index from <table_name>;

Verify makedoc (from pushQ) Find the make file for your target dataset and check inside that the tables listed in section “Tables” (in PushQ content, beware that may not be complete!) are included. Remember to update your cvstree before you start anything!
hgwdev> /cluster/home/<uid>/checkout/kent/src/hg/makeDb/doc/<new_db.txt>
If everything is there, be sure to click on “Y” in pushQ next to MakeDoc Verified.
Run featureBits

Run code alone on some tables and then between tables. Expected results would be that gap does not have much overlap with annotation tables like genscan.

hgwdev> featureBits <database> <table> <table>

Add results to featureBits section of pushQ item. See also: http://hgwdev.cse.ucsc.edu/qa/test-results/featureBits.html

Check to make sure that none of the table names have underscores

      	mysql> show tables like "%\_%";
      	+--------------------------+
      	| Tables_in_calJac1 (%\_%) |
      	+--------------------------+
      	| all_est                  |
      	| all_mrna                 |
      	+--------------------------+
      	2 rows in set (0.00 sec)

mysql> show tables like "%\_%\_%";

Empty set (0.00 sec)

Push net and chain table in other organisms that point to this new one (if any). This will involve these tables:
otherOrg.(chrN_)chainYourOrg

otherOrg.(chrN_)chainLinkYourOrg

otherOrg.netYourOrg
(do not push the trackDb or hgFindSpec for these yet)
Make sure that there is a liftOver file from the previous assembly to this assembly. This is the number one request after a new release.
If the new assembly is an update to the human, mouse, rat, zebrafish, D. melanogaster, C. elegans, or S. cerevisiae genomes, make sure that the appropriate *blastTab tables to this assembly are built.
Review all tracks in the sub pushQ as usual!
Check that all of the MySQL tables are in good repair:
hgwbeta> sudo dbCheck.sh $db
This will do a myisamchk on all tables (files) in that $db and repair any that need repairing (noted in the output by the words "REPAIR needed").
Review the gateway page last

Push Data to Round Robin from hgwbeta

Make sure nothing needs to be repushed from hgwdev to hgwbeta (you can use hgwdev> updateTimesDb.csh to compare table update times between hgwdev and hgwbeta)

If you are going to repush any genbank tables (see list of genbank tables here), you must push ALL genbank tables together (not just some)

send warning email to genome-mirror 24 hours before you release, send an email to genome-mirror (mirror site managers) to let them know you are about to dump a bunch of data on them. The way to find out how much data is:

      Size of entire assembly database:
      hgwbeta> cd /tmp
      hgwbeta> dbSnoop -unsplit $db $db.dbSnoop
      hgwbeta> head $db.dbSnoop

      Size of entire assembly gbdb:
      hgwbeta> cd /gbdb
      hgwbeta> du -hsc $db

Adjust the release log: Compile a list of the tracks being released on this assembly and paste it into the release log box of the main pushQ entry for the initial release of the assembly. You can fetch the list from the assembly pushQ (Note that for the genbank tracks you will need to get the names manually.):

           ssh hgwbeta
           hgsql -Ne "SELECT track from $db" qapushq > releaselog

Request rsync of entire database from push-request

rsync /gbdb again as necessary. Remind the pushers to remove this assembly from the mirror's "exclude" list (so that the mirror sites can now rsync the /gbdb for this assmebly). Make sure the mirrors know that the assembly will now be removed from the "exlude" list.(htmlPath field in hgcentral.dbDb points to /gbdb/$db/html/description.html to make Gateway text page)

push chains/nets for other species if there are chains and nets to other species, make sure that the tables in the other databases are pushed, along with the trackDbs

Update hgcentral

Copy entries from hgcentralbeta (on hgwbeta) to hgcentral (on genome-centdb). You can use hgwdev> checkMetaData.csh to compare the metadata tables on any two machines (e.g. checkMetaData.csh <db_name> hgwbeta hgw1). This script will produce several files. You can then edit the file for each table (remove the column header line) then load them into the hgcentral database.

You can log into genome-centdb and view the hgcentral database like so:

            hgwdev> hgsql -h genome-centdb
            mysql> USE hgcentral;

Or, you can load the edited output file from the checkMetaData.csh script directly into hgcentral by doing the following:

  hgwdev> hgsql -h genome-centdb -e 'LOAD DATA LOCAL INFILE "'dbDb.$db.common'" 
               INTO TABLE dbDb' hgcentral

dbDb
- with active column set to 0 (don't set active = 1 until you are ready for the assembly to go live on the RR).
- hgNearOk = 1 is still OK for older assembly of same organism.
- edit orderKey if necessary to reflect the order as listed for this assembly on hgcentralbeta. Note: the orderKey information may be overridden by some of the CGIs, so it is not always apparent that the orderKey needs to be changed. One good place to check the order in the Browser is the drop-down menus on the PCR page (hgPcr).
- blatServers
- genomeClade (this only needs to be edited if this is a “1” assembly – first assembly for this organism)
- make gdbPdb entry to point to proteins database, if not default (for Known Genes)
- set liftOverChain. Do not change hgcentral.liftOverChain for assemblies on the RR going to the new db until the new db is active on the RR. Prepare a file to load into hgcentral immediately after setting active = 1.

Enable Assembly on Round Robin

Test the assembly tracks, BLAT, PCR, etc. by forcing db=$org and position= into the hgTracks URL (e.g. view an older assembly, then edit the URL so that you are actually viewing your new assembly).

When you know that everything is working, set the assembly to active:

      hgwdev> hgsql -h genome-centdb
      mysql> USE hgcentral;
      mysql> UPDATE dbDb SET active = 1 WHERE name = "$db";

defaultDb (set your assembly as the default assembly for this organism). You can do this for hgcentraltest and hgcentralbeta now too.

Push Downloads from hgwdev to hgdownload

These don't need to be pushed to hgwbeta or Round Robin – just straight from hgwdev to hgdownload.

Make sure that the permissions for these two directories are group protein writable (at least chmod 664). The developer who created this assembly will probably be the owner of the directory and the files in it; you may need to ask him/her to change the permissions. Ask the pushers to be sure to keep the permissions as they are when they push the files (especially making sure that they are group protein writable).

      hgwdev> /usr/local/apache/htdocs/goldenPath/$db/bigZips
      hgwdev> /usr/local/apache/htdocs/goldenPath/$db/database

The easiest way to ensure that the directory is group protein writable is to ask for a push of an empty directory with the appropriate permissions before you fill it with stuff. Another possibility is if you want to push a directory with stuff it helps if they can push the whole directory and all of its contents and not just certain items in the directory. For example, if you ask to push:

      	/usr/local/apache/htdocs/goldenPath/foo/foobar.gz
      	/usr/local/apache/htdocs/goldenPath/foo/foobaz.gz
      	/usr/local/apache/htdocs/goldenPath/foo/barbaz.gz

where usr/local/apache/htdocs/goldenPath/foo is a new directory and contains only those three files, then you should instead just ask to push the directory. If you tell them to push /usr/local/apache/htdocs/goldenPath/foo/*, they strip off the /* for this very reason.

Before requesting this push, check the md5sum: /usr/local/apache/htdocs/goldenPath/$db/bigZips/md5sums.txt (for each file – double-check download against size).

/usr/local/apache/htdocs/goldenPath/$db/"*"

Check that we have READMEs at top level (optional), and for bigZips, chromosomes, liftOvers and comparatives (multiz, phastCons, vsXXX). And read them!

(.../$db/database will be empty except for README.txt -- Admins fill this on Round Robin (not on hgwdev or hgwbeta) by autodump after push. Send request to cluster-admin.)

In bigZips, look for upstream*.zip -- check that they unzip into same number of records. Note that some of the files mentioned in the README are generated by the Genbank process.

Look for liftOver files in other assemblies of the same org. e.g., from /usr/local/apache/htdocs/goldenPath/, try: find . -name "*ToHg17*" OR try: ls */liftOver/*ToRn4.over.chain.gz. Push xspecies liftOver files in /liftOver directory (do NOT push inside vsXXX directories). Notify Donna that new liftOver files are ready to be linked in docs. When ready, push hgwdev:/usr/local/apache/htdocs/goldenPath/$db/* to hgdownloads. Also push md5sum.txt files for liftOvers. They may need to be edited (at least temporarily) to include only the files on hgdownload.

     Push pairwise alignments (vsXXX) for:
     - all the chain/net tracks in an assembly
     - all the species featured in the conservation track

     Also push pairwise alignments in other assembly databases to the new assembly. To find them, from /usr/local/apache/htdocs/goldenPath, try: ls */vsXXX

Request autodump -- manually now, and ongoing for later. Ask the pushers to dump the mysql tables from the RR to .txt.gz and .sql files on hgdownload:/usr/local/apache/htdocs/goldenPath/$db/database, and to start the autodump for this database so that the files will be updated with RR tables.

Add symlink to hgwdev> /usr/local/apache/htdocs/goldenPath/currentGenomes and request a push to hgdownloads. This is for ftp users (when they press on the organism's name, they will go to the newest assembly download files).

  hgwdev/usr/local/apache/htdocs/goldenPath/currentGenomes> rm Drosophila_ananassae
  hgwdev/usr/local/apache/htdocs/goldenPath/currentGenomes> ln -s ../droAna2 Drosophila_ananassae

Push Static Content from hgwdev to hgwbeta and Round Robin

Edit and push static content (sometimes Donna does this or you may need to do it): Push from hgwdev -> hgwbeta, RR:

/usr/local/apache/htdocs/indexNews.html: Add the announcement to the top of the page. Copy the entire text for the existing top article to goldenPath/newsarch.html. Then, move that article down to the bottom of the page & abridge it, add a "Read more" link that points to the goldenPath/newsarch.html. The indexing scheme for these (in case it isn't obvious) is MMDDYY.

/usr/local/apache/htdocs/goldenPath/newsarch.html:See above.

/usr/local/apache/htdocs/goldenPath/credits.html: Add a new section, then add the credits. The developer should give you the information you need to fill this out. You can also look in the makedoc. Some releases have rather sketchy info, others have more. Be sure to include the data use restrictions info.

/usr/local/apache/htdocs/FAQ/FAQreleases.html: Add a new entry for $db to the release table

Push from hgwdev -> hgdownload:

- /usr/local/apache/htdocs/downloads.html: Add a new section for your org at the top of the page. Be sure to push this page to hgdownload, not the RR.

Push from hgwdev -> hgnfs1:

- /gbdb/$db/html/description.html: Add the text for the gateway page. The developer or Donna can help you do this. You sometimes have to do some shenanigans to get it to show up (may have to set your own links for /gbdb if they're not already set up).

If new types of tables: goldenPath/gbdDescriptions.html and goldenPath/help/hgTracksHelp.html

Edit Other Organisms

Now push the trackDb/hgFindSpec as needed for the OtherOrgs that have chains and nets pointing to YourOrg so that these tracks will be turned on. Be sure to run compareTrackDbAll.csh and compareHgFindSpec.csh. Resolve issues as usual.

Drop the old chains and nets that these are replacing, if any. After you drop the chains/nets, be sure to send an email to genome-mirror letting them know what has been dropped (so that they can drop from their mirror sites).

Announce the Release

Send an email to Donna letting her know that the assembly is released and working on the RR. She will send announcements to:

- genome mailing list (genome@soe.ucsc.edu)
- genecats mailing list (genecats@soe.ucsc.edu)
- OR, edit the documentation yourself and send announcements as above. see "Push Static Content" section.

Notify cluster-admin that the new assembly is available and needs to be released to genome-mysql. Permissions should be made for users "genome" and "genomep". The admins also need to update the mysql.db table permissions. (Jorge says we can ask them to follow the instructions in their wiki for "Mirror_Server".)

If this is a new species, then send an email to Branwyn (bwagman@ucsc.edu) so she can determine whether or not she wants to announce it on the CBSE website.

Maintenance

Make sure Genbank daily updates are running on Round Robin. You can do this by viewing the dates on the i download files (they should be more recent than the ones you pushed with your release).

The day after you press “done!” in the main push queue for your assembly, the Release Log on the website will be updated with the information about the new release (from whatever you entered into the Release Log field of the main push queue). The day after your release, be sure the check the Release Log on the website to make sure that it is present and that it reads correctly.

Check the downloads against the md5sum size. You can download each one and then run

      hgwdev> md5sum <filename>

Note, you may need to push a fresh all.joiner to RR for hgTables

Check that genome-mysql is working. From hgwdev:

      mysql -h genome-mysql -A -u genome

Check to see that these files have been made by the genbank automatic download building process:

              htdocs/goldenPath/$db/bigZips/
      		est.fa.gz
      		mrna.fa.gz
      		refMrna.fa.gz
      		xenoMrna.fa.gz
      		refGene.fa.gz
      		xenoRefGene.fa.gz

@@ Line 12: / Line 12: @@
 b) Create a list of tables to create inside the database from the tables listed in the push queue for the new assembly. There should be a table called <new_db> in the qapushq database on hgwbeta which can be used to get all of the tables at once:
 ::hgwbeta > hgsql -Ne "SELECT tbls FROM  WHERE dbs=''" qapushq > tableList
+To convert spaces to newlines for the tableList:
+:: awk '{ for (i=1;i<=NF;i++) print $i }' infileName > outfileName
 c) Push tables to hgwbeta

New Assembly Release Process Details: Difference between revisions

Revision as of 18:20, 9 April 2010

Contents

Stage and test on hgwbeta

Push Data to Round Robin from hgwbeta

Update hgcentral

Enable Assembly on Round Robin

Push Downloads from hgwdev to hgdownload

Push Static Content from hgwdev to hgwbeta and Round Robin

Edit Other Organisms

Announce the Release

Maintenance

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools