Outdated instructions for releasing an assembly: Difference between revisions

From Genecats
Jump to navigationJump to search
(hgwbeta)
 
(104 intermediate revisions by 15 users not shown)
Line 1: Line 1:
==Pre-staging of assembly on hgwbeta==
<h1><span style=background-color:yellow> WARNING: Some parts of this "Releasing an assembly" section are out of date.<br><br>
Use newer version of these assembly QA & Release steps:</span><br><br>
<span style="color:red">Link to new wiki instructions: [http://genomewiki.ucsc.edu/genecats/index.php/Assembly_Release_QA_Steps Assembly Release QA Steps (new 2017 steps)]</span></h1>
* The new instructions include a Google spreadsheet checklist that is sync'd with the new wiki.


<h1></h1>
<h2><span style=background-color:yellow>* BELOW ARE OLD AND OUTDATED ASSEMBLY RELEASE INSTRUCTIONS. </span>
<br>
<span style=background-color:yellow>* PLEASE ADD UPDATES TO [http://genomewiki.ucsc.edu/genecats/index.php/Assembly_Release_QA_Steps THE NEW ASSEMBLY RELEASE INSTRUCTIONS] INSTEAD. </span></h2>


===Put your name as the reviewer in main pushQ, and claim the Redmine ticket===
'''Pre-staging of assembly on hgwbeta'''
 
 
""Put your name as the reviewer in main pushQ, and claim the Redmine ticket''
That way people know you're working on it.
That way people know you're working on it.


===Check if chromosome sizes have changed significantly (batches*)===
'''Check if chromosome sizes have changed significantly (batches*)'''
* If you are releasing an update to an assembly, check to see if chromosome sizes have changed significantly. Report any significant changes to the developer.
* If you are releasing an update to an assembly, check to see if chromosome sizes have changed significantly. Report any significant changes to the developer.
* Output chromosome sizes from the old and new assemblies into two files and compare them
* Output chromosome sizes from the old and new assemblies into two files and compare them
Line 14: Line 24:
<nowiki>*</nowiki> Steps marked with (batches) should be done for all assemblies in the pushQ at once. Make a note in the redmine issues of the other assemblies that you're working on the pre-staging and staging steps and then make another note once you're finished. This will hopefully speed up the release of the many assemblies on the horizon.
<nowiki>*</nowiki> Steps marked with (batches) should be done for all assemblies in the pushQ at once. Make a note in the redmine issues of the other assemblies that you're working on the pre-staging and staging steps and then make another note once you're finished. This will hopefully speed up the release of the many assemblies on the horizon.


===Check that any chain/net/liftOvers listed in the pushQ are to valid assemblies on the RR (batches)===
'''Check that any chain/net/liftOvers listed in the pushQ are to valid assemblies on the RR (batches)'''
If your assembly has a chain/net/liftOver to/from an assembly that is *not* on the RR (and not in the pushQ as another new assembly), you do not need to QA them or push them to the RR. Drop the relevant row(s) from your sub-pushQ by going to the track entry, clicking lock and then clicking the delete button. (If it would be helpful to see all of the other-organism liftover files at once, cd to /gbdb on hgwdev and use this command: ls -d */liftOver/*$db* .)
If your assembly has a chain/net/liftOver to/from an assembly that is *not* on the RR (and not in the pushQ as another new assembly), you do not need to QA them or push them to the RR. Drop the relevant row(s) from your sub-pushQ by going to the track entry, clicking lock and then clicking the delete button. (If it would be helpful to see all of the other-organism liftover files at once, cd to /gbdb on hgwdev and use this command: ls -d */liftOver/*$db* .)


===Check to ensure that there are enough tracks to be considered at least a "minimal" browser (batches)===
'''Check to ensure that there are enough tracks to be considered at least a "minimal" browser (batches)'''
For browser for a '''new''' organism (not an update to an existing browser), at least these tracks must exist:
[[Minimal browser]]
* sequence
* repeat masker
* gold/gap assembly tracks
* genbank RNA & EST (xenoRNA if native RefSeq is sparse)
* BLAT/PCR servers
* CPG Islands
* Genscan
 
And here's a list of tracks for a "pretty good" browser (these are strongly recommended if it's going to be part of a multiple alignment):
* all of the "minimal" tracks listed above
* Transmap (mapping gene set from closest well-annotated organism)
* ENSEMBL Genes (if available)
* Human Proteins
* Human chain/net (at least human, if not others)
* 3-8-way multiple alignment
* Self Chain track for "finished" assemblies (e.g. human, mouse, zebrafish)


==Stage assembly on hgwbeta==
'''Stage assembly on hgwbeta (batches)'''


===Push tables to mysqlbeta===
'''Push tables to hgwbeta'''


====Push the database and tables from hgwdev to hgwbeta====
'''Push the database and tables from hgwdev to hgwbeta'''
  hgwdev> sudo mypush $db '*' mysqlbeta
  hgwdev> sudo mypush $db '*' hgwbeta
* Remove the hgFindSpec*, trackDb*, tableDescriptions and tableList tables from hgwbeta:
* Remove the hgFindSpec*, trackDb* tables from hgwbeta:
  hgwbeta> hgsql $db
  hgwdev> hgsql -h hgwbeta $db
   
   
  ''Then for each table listed above:''
  ''Then for each table listed above:''
Line 49: Line 43:


Note: there may be several hgFindSpec and trackDb tables shows as trackDB_someonesname. So the * after those two tables just means delete all tables that begin with that name.
Note: there may be several hgFindSpec and trackDb tables shows as trackDB_someonesname. So the * after those two tables just means delete all tables that begin with that name.
Similarly, you can create a file of the tables you intend to push and then use a simple loop to push them to beta. Be sure to remove the "trackDb_*" and "hgFindSpec_*" tables from your file before pushing them.
For example, your tables file would contain things like:
<pre>
all_mrna
author
cds
cell
chromInfo
cpgIslandExt
...
</pre>
Then, you can use the following loop to push the tables to beta:
<pre>
hgwdev> for $table in `cat $tableList`; do sudo mypush $db $table hgwbeta; done
</pre>


'''Alternative method to get db and tables on beta: create database on hgwbeta and push tables for $db'''
'''Alternative method to get db and tables on beta: create database on hgwbeta and push tables for $db'''
* Create the database on hgwbeta.
* Create the database on hgwbeta.
  hgwbeta > hgsql  
  hgwdev > hgsql -h hgwbeta
  mysql > CREATE DATABASE $db;
  mysql > CREATE DATABASE $db;
* Create a list of tables to import from hgwdev from the tables listed in the push queue. There should be a table called '$db' in the qapushq database on hgwbeta which can be used to get all of the tables at once:
* Create a list of tables to import from hgwdev from the tables listed in the push queue. There should be a table called '$db' in the qapushq database on hgwbeta which can be used to get all of the tables at once:
  hgwbeta > hgsql -Ne "SELECT tbls FROM $db WHERE dbs='$db'" qapushq > tables
  hgwdev > hgsql -h hgwbeta -Ne "SELECT tbls FROM $db WHERE dbs='$db'" qapushq > tables
To convert spaces to newlines for the tableList:
To convert spaces to newlines for the tableList:
  awk '{ for (i=1;i<=NF;i++) print $i }' tables > tableList  
  awk '{ for (i=1;i<=NF;i++) print $i }' tables > tableList  
* Remove the hgFindSpec*, trackDb*, and tableDescriptions tables from the tableList.
* Push the tables to hgwbeta.
* Push the tables to hgwbeta.
  hgwdev > bigPush.csh $db tableList  
  hgwdev > bigPush.csh $db tableList  
bigPush.csh gives size of the push at the end, which you can use to confirm it is "similar" to the original size from hgwdev. You can also compare sizes in the main pushQ by putting a "*" in the tables field, selecting hgwdev from the "Current Location", and then clicking on "show sizes" button.
bigPush.csh gives size of the push at the end, which you can use to confirm it is "similar" to the original size from hgwdev. You can also compare sizes in the main pushQ by putting a "*" in the tables field, selecting hgwdev from the "Current Location", and then clicking on "show sizes" button.
* If you create a tableList, you can also use this list to find out which of your tables should be searchable with a command like the following when you get to the step of QAing individual tracks:
for i in $(cat tableList); do hgsql -Ne "select searchName,searchTable,searchMethod,termRegex from hgFindSpec where searchTable like '%$i%';" $db; done


====Push chain/net tables in other organisms====
'''Push chain/net tables in other organisms'''
In the sub-pushQ for the assembly there may be chain/net tracks listed. Push these tracks to hgwbeta. Only push tables for databases that exist on hgwbeta/RR though.  
In the sub-pushQ for the assembly there may be chain/net tracks listed. Push these tracks to hgwbeta. Only push tables for databases that exist on hgwbeta/RR though. (These would not be captured by your earlier SELECT tbls FROM $db WHERE dbs='$db' to form a tableList to push, because it has a different dbs=, likely hg19, for example, but the tables you will be pushing, to say the hg19 database, will be named after your $db)


The tables to push for each listed assembly are:  
The tables to push for each listed assembly are:  
Line 72: Line 85:
* net'''$db'''
* net'''$db'''


Since you may be pushing the same tables for several DBs (i.e., assemblies), it may save time to create a file containing the table names and use bigPush.csh:
Since you may be pushing the same tables for several $DBs (i.e., assemblies like hg19), it may save time to create a file containing the table names and use bigPush.csh:
   
   
  hgwdev > bigPush.csh $db tableList  
  hgwdev > bigPush.csh $DBs tableList  


After pushing the tables you will need to make beta in trackDb on hgwbeta for each of the other organisms.
After pushing the tables you will need to make beta in trackDb on hgwbeta for each of the other organisms.


===Update hgcentralbeta===
'''Update hgcentralbeta'''
 
You can copy items from hgcentraltest to hgcentralbeta with the copyHgcentral script.  For the usage statement, run:
 
hgwdev > copyHgcentral -h
 
The copyHgcentral script must be run in test mode first.  Test mode will show you the state of hgcentraltest, hgcentralbeta and hgcentral.  Once test mode has been run and everything looks good, you can run execute mode to copy from hgcentraltest to hgcentralbeta.  Note that test mode generates output files which must be manually deleted afterward.  Be sure to run copyHgcentral in your home directory and not in a directory where we don't want temp files to end up.
 
As an example, if you wanted to copy the contents of blatServers from hgcentraltest to hgcentralbeta for hg19, you would first run test mode with the following command:


NOTE: the instructions below have you create several files containing metadata. It's nice to SAVE these files to use when staging the assembly on the RR.
hgwdev > copyHgcentral test hg19 blatServers dev beta


====dbDb====
This would generate the following output:
Here you will create (or update) hgcentralbeta.dbDb metadata. This will add the new assembly to the hgGateway page. When checking on hgwdev, make sure that the assembly date is correct under the description column. It should be later than the previous assembly. If it is not, contact the developer.
* Check to make sure your row doesn't already exist in hgcentralbeta:
hgwbeta > hgsql hgcentralbeta
mysql > select * from dbDb where name = '$db'\G
* Check to make sure the row exists on hgcentraltest:
hgwdev > hgsql hgcentraltest
mysql > select * from dbDb where name = '$db'\G
* If the above looks correct, then redirect it to a file:
hgwdev > hgsql -N -e "select * from dbDb where name = '$db'" hgcentraltest > hgcentraltest.dbDb
* Check the newly created file:
hgwdev > cat hgcentraltest.dbDb
* Load onto hgcentralbeta:
hgwbeta > hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'hgcentraltest.dbDb' INTO TABLE dbDb" hgcentralbeta
* Check to see if hgcentralbeta has been updated with the new row:
hgwbeta > hgsql -h mysqlbeta -e "select * from dbDb where name = '$db'" hgcentralbeta


====blatServers====
--------------------------------------------------
The developer has often already requested that the blat servers be set up for the new assembly - check whether there are lines in hgcentraltest.blatServers. If they are there, follow the steps below. If not, request a blat server from the cluster-admins and create 2 lines in hgcentraltest.blatServers and hgcentralbeta.blatServers. The cluster-admins will give you the name of the blatServer and the port numbers for the isTrans and canPcr. Then you can add two new lines to the blatServer table for this information on both the hgcentraltest database on hgwdev. If this is an update to a previous assembly, you will want to leave the entries for the previous assemblies in the blatServers table. For more information about where the blat servers for different machines should be hosted go to [[Updating blat servers]].
--------------------------------------------------
* Get the data from hgwdev:
<<< blatServers >>>
  hgwdev > hgsql -Ne "SELECT * FROM blatServers WHERE db = '$db'" hgcentraltest > blat.dev
*Check if the lines are already on beta and load if not:
hgcentraltest
  hgwbeta > hgsql -h mysqlbeta -Ne "SELECT * FROM blatServers WHERE db = '$db'" hgcentralbeta
  -------------
  hgwbeta > hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'blat.dev' INTO TABLE blatServers" hgcentralbeta
hg19    blat4a  17778  1      0
hg19    blat4a  17779  0      1
hgcentralbeta
  -------------
hg19    blat4a  17778  1      0
hg19    blat4a  17779  0      1
hgcentral
  -------------
hg19    blat4a  17778  1      0
hg19    blat4a  17779  0      1
*** The blatServers data on dev and beta is identical ***
*** The blatServers data on beta and rr is identical ***


====genomeClade and gdbPdb====
When ready to run execute mode, just replace "test" with "execute" in the command line.
Move data for these tables, as needed, using same methods as listed above for [[#dbDb | dbDb]]:


* '''hgcentralbeta.gdbPdb''' - gdbPdb is only relevant for human, mouse and rat assemblies, and is used with knownGene tracks.
Some important things to note:


* '''hgcentralbeta.genomeClade''' - genomeClade is used to populate the pulldown menu on hgGateway, and set the order in which assemblies are listed. If this is not the first assembly for an organism, genomeClade will not need updating.
* You can run copyHgcentral on all tables at once using "all" as the table name
* There is no way to overwrite anything in test mode
* When running execute mode:
** If the data is identical between the origin server and the destination server, nothing will be copied
** If the data differs between the origin server and the destination server, you will be forced to respond before anything is copied.  It is impossible to overwrite something here by accident.
** If copying a dbDb entry from hgcentralbeta to hgcentral, active is automatically set to 0.  There is no way to accidentally put a new assembly on the RR with active=1.


====liftOverChain====
'''dbDb'''
Only copy lines from liftOverChain on hgcentraltest to hgcentralbeta if there are liftOver files listed in the pushQ and if the assemblies they go to/from exist on the RR. Check for lines in liftOverChain that should be in the pushQ but aren't (e.g. the liftOver from a previous assembly). Email the developer and ask them to add them to the pushQ if necessary.
hgwdev > copyHgcentral test $db dbDb dev beta
hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentraltest > chain.dev  
 
Check beta, load if not present and recheck:
Adding a dbDb entry will add the new assembly to the hgGateway page. Examine the copyHgcentral output and make sure that the assembly date is correct under the description column. It should be later than the previous assembly. If it is not, contact the developer.
hgsql -h mysqlbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentralbeta
 
hgsql -h mysqlbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta
'''blatServers'''
hgwdev > copyHgcentral test $db blatServers dev beta
 
The developer has often already requested that the blat servers be set up for the new assembly. If not, and/or if entries for your assembly are missing from hgcentraltest.blatServers, please make a note in the Redmine ticket and ask the assembly builder to 1) request the setup of the blat servers and to 2) manually add the entries to hgcentraltest.blatServers.


====defaultDb====
Make sure that this assembly is not hosted on "blatx" BLAT server. That server is not as stable and therefore is for assemblies that are not destined for the RR. For more information about where the blat servers for different machines should be hosted, go to [[Updating blat servers]].
'''Do not''' change the value for defaultDb for human or mouse on hgwbeta. Leave them set to the previous assembly because many people use these assemblies and will be confused if it changes on hgwdev and hgwbeta.  


For existing organisms other than human and mouse, change the defaultDb so that you don't accidentally test the previous assembly.
'''defaultDb'''
hgwdev > copyHgcentral test $db defaultDb dev beta


mysql> UPDATE defaultDb SET name="$db" where genome="$genome";
defaultDb controls which assembly is the default in the "assembly" drop-down menu when an organism is chosen from the "genome" drop-down menu in hgGateway (e.g., when "Mouse" is selected from the "genome" menu, defaultDb controls whether mm9 or mm10 is the default in the "assembly" menu).  If this is the first assembly for an organism, you will need the defaultDb entry in order for the assembly to appear on hgwbeta.


$db will be the database of your working assembly (eg. panTro4, hg19, etc), and $genome will be the organism (eg. Chimp, Human, S. cerevisiae, etc).
'''Do not''' change the value for defaultDb for human or mouse on hgwbeta. Leave them set to the previous assembly because many people use these assemblies and will be confused if it changes on hgwdev and hgwbeta.  


For existing organisms other than human and mouse, most often one should change the defaultDb so that you don't accidentally test the previous assembly.  After discussing the idea of when to change a default assembly there is a question of ''asking whether the data available is better on the previous assembly'', that would be a reason to not update the default assembly.


If this is the first assembly for an organism, you will need the defaultDb entry in order for the assembly to appear on hgwbeta. Either transfer the line from hgcentraltest or add the line manually (probably easier) to defaultDb on hgcentralbeta:
'''genomeClade'''
hgwdev > copyHgcentral test $db genomeClade dev beta


mysql> INSERT INTO defaultDb VALUES ("$genome", "$db");
genomeClade is used to populate the "genome" drop-down menu in hgGateway and set the order in which organisms are listed in that menu.  If this is not the first assembly for an organism, genomeClade will not need updating.


Where $genome is the name of the organism  (e.g., Human, Mouse, Chicken, etc.) and $db is the assembly (e.g., hg19, mm9, galGal3, etc.).
'''liftOverChain'''
liftOverChain is not copied with the copyHgcentral script.  In the future, there will be a copyLiftOverChain script, but for the time being, liftOverChain needs to be copied manually.


====Check Meta Data====
Only copy lines from liftOverChain on hgcentraltest to hgcentralbeta if there are liftOver files listed in the pushQ and if the assemblies they go to/from exist on the RR. Check for lines in liftOverChain that should be in the pushQ, but aren't (e.g., the liftOver from a previous assembly). Email the developer and ask them to add them to the pushQ if necessary.
hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentraltest > chain.dev
Check beta, load if not present and recheck:
hgsql -h hgwbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentralbeta
hgsql -h hgwbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta
 
'''Check Meta Data'''
After you have completed the steps above, use the script '''checkMetaData.csh''' to make sure that all of the metadata is the same on hgwdev and on hgwbeta. Run this script in a temporary folder because it creates several files.
After you have completed the steps above, use the script '''checkMetaData.csh''' to make sure that all of the metadata is the same on hgwdev and on hgwbeta. Run this script in a temporary folder because it creates several files.


===Push /gbdb/$db and html/description.html===
'''Push /gbdb/$db and html/description.html'''
Extract all of the gbdb files from the pushQ for your org and those for the other orgs as well:
Extract all of the gbdb files from the pushQ for your org and those for the other orgs as well:


  hgwbeta > hgsql -Ne "SELECT files FROM $db" qapushq > fileList  
  hgsql -h hgwbeta -Ne "SELECT files FROM $db" qapushq > fileList  


Ask for a push of the list of /gbdb files above from hgwdev to hgnfs1 (don't worry about the downloads yet). Remind the pushers that items that are symlinked on hgwdev should become real files on hgnfs1. To see how big these files are:  
Ask for a push of the list of /gbdb files above from hgwdev to hgnfs1 (don't worry about the downloads yet). Remind the pushers that items that are symlinked on hgwdev should become real files on hgnfs1. To see how big these files are:  
Line 150: Line 187:
  hgwdev > du -hscL `ls -d */liftOver/*$db*` .
  hgwdev > du -hscL `ls -d */liftOver/*$db*` .


===Push image file to hgwbeta and rr===
'''Push image file to rr'''
The image file that appears on the gateway page should reside in the kent source tree in:
The image file that appears on the gateway page should reside in the kent source tree in:
  ~/kent/src/hg/htdocs/images/
  ~/kent/src/hg/htdocs/images/
and a copy should exist at:
and a copy should exist at:
  hgwdev > /usr/local/apache/htdocs/images/
  hgwdev > /usr/local/apache/htdocs/images/


Confirm that the width of the box around the image in description.html is 15 pixels wider than the image.
Confirm that the width of the box around the image in description.html is 15 pixels wider than the image.
Line 163: Line 199:
If there is a previous assembly, it is possible that it is using the same image on the gateway page. Check on hgwbeta to see if the image is missing. If it isn't, you don't need to ask for the image to be pushed.
If there is a previous assembly, it is possible that it is using the same image on the gateway page. Check on hgwbeta to see if the image is missing. If it isn't, you don't need to ask for the image to be pushed.


To get the image to appear on hgwbeta and the RR, ask for a push of the file from hgwdev (at the /usr/local/apache... location) to hgwbeta and the RR. It's a good idea to ask for the push of the image to the RR during the staging process, as you will inevitably forget to push it when it's time to release the assembly. If there are any other images for this assembly (for instance, the phylo image that goes with the Conservation track), you can push them too.
To get the image to appear on hgwbeta, do a "make beta" in src/hg/htdocs/.  Ask for a push of the file from hgwbeta (at the /usr/local/apache... location) to the RR. It's a good idea to ask for the push of the image to the RR during the staging process, as you will inevitably forget to push it when it's time to release the assembly. If there are any other images for this assembly (for instance, the phylo image that goes with the Conservation track), you can push them too.


===Make trackDb on hgwbeta===
'''Make trackDb on hgwbeta'''
Remake the trackDb on hgwbeta. Will likely need to be done again as track descriptions are updated.
Remake the trackDb on hgwbeta. Will likely need to be done again as track descriptions are updated.
  hgwbeta> cd kent/src/hg/makeDb/trackDb  
  hgwdev> cd kent/src/hg/makeDb/trackDb  
  hgwbeta> make beta DBS=$db
  hgwdev> make beta DBS=$db


===Turn on GenBank updates===
'''Turn on GenBank updates'''
The new assembly should already be listed in the files align.dbs and hgwdev.dbs in the source tree at ~/kent/src/hg/makeDb/genbank/etc/. If it is not, check with Brian Raney. If it is, turn on GenBank updates on hgwbeta before 4:30 p.m., when the daily updates start, by adding the new assembly to /kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs in alphabetical order. After committing the change:  
The new assembly should already be listed in the files align.dbs and hgwdev.dbs in the source tree at ~/kent/src/hg/makeDb/genbank/etc/. If it is not, check with Brian Raney. If it is, turn on GenBank updates on hgwbeta before 4:30 p.m., when the daily updates start, by adding the new assembly to /kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs in alphabetical order.  Do not yet edit the rr.dbs, this comes later when the assembly is on the RR. After committing the change, make sure your libs are up to date:
  ssh hgwbeta
cd ~/kent/src ; make libs
 
then go ahead and run the make:  
   
  cd ~/kent/src/hg/makeDb/genbank/  
  cd ~/kent/src/hg/makeDb/genbank/  
  git pull  
  git pull  
  make etc-update-rr etc-update-server
  make install-rr install-server
Note: etc-update-rr is correct, as this updates all of the /genbank/etc files viewable by the rr.
To see whether updates have run (at least the Monday after the *.dbs files were updated), check the update times of the table 'gbLoaded':
To see whether updates have run (at least a day after the *.dbs files were updated), check the update times of the table 'gbLoaded':
  hgwdev > updateTimes.csh $db gbLoaded  
  hgwdev > updateTimes.csh $db gbLoaded  
The update times will be out of sync between machines, but not by more than 24 hours or so if updates are running. The gbLoaded table will be updated regardless of whether changes to other GenBank tables were picked up. More genbank update instructions are available at [[Genbank updates]].
The update times will be out of sync between machines, but not by more than 24 hours or so if updates are running. The gbLoaded table will be updated regardless of whether changes to other GenBank tables were picked up. More genbank update instructions are available at [[Genbank updates]].
Line 183: Line 221:
The etc-update-server part of the make will cause the downloads mentioned below in the "Verify downloads" section to be created.
The etc-update-server part of the make will cause the downloads mentioned below in the "Verify downloads" section to be created.


===Review the phylogenetic location in pull-down menus on hgGateway (batches*)===
'''Review the alphabetic location in pull-down menus on hgGateway'''
Organisms are supposed to be listed in phylogenetic order in the pull-down menus on hgGateway. Check to see if your new genome is in the the right place evolutionarily. To determine what its position should be, use the "master tree," which is always the highest numbered file, located here: hgwdev:~/kent/src/hg/utils/phyloTrees. This tree is created by Hiram and reviewed by Bob.
Organisms are supposed to be listed in alphabetic order in the pull-down menus on hgGateway. Check to see if your new genome is in the right place alphabetically starting with the first letter. Only Human and Mouse will be out of order as they are supposed to appear at the top of their "group" drop-down. Each letter is allotted 1000 numbers, here is the current arrangement:
<nowiki>
1 - 100: Human          9001 - 10000: I          19001 - 20000: S
101 - 200: Mouse        10001 - 11000: J          20001 - 21000: T
1001 - 2000: A          11001 - 12000: K          21001 - 22000: U
2001 - 3000: B          12001 - 13000: L          22001 - 23000: V
3001 - 4000: C          13001 - 14000: M          23001 - 24000: W
4001 - 5000: D          14001 - 15000: N          24001 - 25000: X
5001 - 6000: E          15001 - 16000: O          25001 - 26000: Y
6001 - 7000: F          16001 - 17000: P          26001 - 27000: Z
7001 - 8000: G          17001 - 18000: Q
8001 - 9000: H          18001 - 19000: R
</nowiki>


  hgwbeta > hgsql hgcentralbeta  
You can check your organism's orderKey on beta using the following commands:
  hgwdev> hgsql -h hgwbeta hgcentralbeta
  mysql > select name, orderKey from dbDb order by orderKey;
  mysql > select name, orderKey from dbDb order by orderKey;


<nowiki>*</nowiki> If there are multiple assemblies in the pushQ, go ahead and do this for all assemblies that are there. Then clearly mark in their redmine ticket that you already did this step so others know not to redo it. You many want to update the other tickets even before you start so that anyone else getting to this step knows you are already working on it.
'''Test on hgwbeta'''
 
'''Check the .2bit files'''
 
The .2bit files contain the new assembly sequence in a compact, binary format. The .2bit files are located at:
 
* /scratch/$db (on the blat server)
* /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips/ (on hgwdev)
* /gbdb/$db/ (on hgwdev)
* /gbdb/$db/ (on hgwbeta)


==Test on hgwbeta==
Check the to make sure that the .2bit files are identical by running the 2bitCompare script.  Particularly if the assembly has been part of a multiz track without a Browser, the file may exist on beta and RR and may not have been masked.


===Check the .2bit files===
Below is some sample output:


====Check that it is not symlink on hgnfs1, that there is only one file, and is not in a subdirectory====
hgwdev> 2bitCompare allMis1
The .2bit file (which contains the new assembly sequence in a compact, binary format) should exist at: /gbdb/$db/$db.2bit.  It should be a symlink on hgwdev, but not on hgnfs1.
  Checking md5sums.  This could take a few minutes.  Please be patient...
        blat4a md5sum: 134e740c05eedadc24de3a96775a25d6 /scratch/allMis1/allMis1.2bit
      download md5sum: 134e740c05eedadc24de3a96775a25d6 /usr/local/apache/htdocs-hgdownload/goldenPath/allMis1/bigZips/allMis1.2bit
    hgwdev gbdb md5sum: 134e740c05eedadc24de3a96775a25d6 /gbdb/allMis1/allMis1.2bit
  hgwbeta gbdb md5sum: 134e740c05eedadc24de3a96775a25d6 /gbdb/allMis1/allMis1.2bit
   
        blat4a date,size: Jun 19 11:03 569794406
      download date,size: Jul 3 10:55 53
    hgwdev gbdb date,size: Jun 7 13:34 39
  hgwbeta gbdb date,size: Jun 7 13:33 569794406


====Check that all 3 copies of the .2bit file (gbdb, downloads, blat servers) are identical====
The first part of the script output lists the md5sums of all four .2bit files.  These should be identical.
The 3 locations of the .2bit file are:


* /gbdb/$db/
The second part of the script output lists the timestamps and filesizes.
* /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips/ (on hgwdev)
* The download and hgwdev gbdb files should be symlinks, as evidenced by a small filesize.
* /scratch/$db (on the BLAT server)
* The blat and hgwbeta gbdb files should be the actual files, as evidenced by a large filesize.
* The two symlink filesizes will likely be different, but the filesize of the two actual files should be identical.


Use md5sum to confirm they are identical. Get the blat server from the hgcentral database and ssh into the machine:
If the blat .2bit is not the same as the other .2bit files, ask the pushers to restart the assembly and to pull the newest .2bit file from /gbdb.


hgwdev > ssh qateam@blat#.cse.ucsc.edu
'''joinerCheck'''


This will let you on to the blat machine after which you can look in /scratch/$db to see the .2bit file. If it is not the same as the other .2bit files ask the pushers to restart the assembly and to pull the newest .2bit file from /gbdb.
'''Check that common keys between tables are in sync'''
hgwdev > cd ~/kent/src/hg/makeDb/schema
hgwdev > joinerCheck -database=$db -keys all.joiner


===joinerCheck===
If there are errors related to genbank identifiers, it is likely because of the genbank load process, and not an issue with your database. Run joinerCheck once the tables are on beta to confirm:
hgwdev > HGDB_CONF=~/.hg.conf.beta joinerCheck -keys -identifier=$identifier all.joiner


====Check that common keys between tables are in sync====
'''Check that table update times are copacetic'''
hgwbeta > cd ~/kent/src/hg/makeDb/schema
  hgwdev > joinerCheck -database=$db -times all.joiner
  hgwbeta > joinerCheck -database=$db -keys all.joiner


====Check that all tables in this database are mentioned in all.joiner====
'''Check that all tables in this database are mentioned in all.joiner'''
  hgwbeta > cd ~/kent/src/hg/makeDb/schema
  hgwdev > joinerCheck -database=$db -tableCoverage all.joiner  
hgwbeta > joinerCheck -database=$db -tableCoverage all.joiner  
If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.
If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.


===Verify makedoc for all the tracks listed in the pushQ===
'''Verify makedoc for all the tracks listed in the pushQ'''
The makedoc file should be here: /src/hg/makeDb/doc/$db.txt. Check that all the tracks listed in the pushQ are included.  
The makedoc file should be here: /src/hg/makeDb/doc/$db.txt. Check that all the tracks listed in the pushQ are included.  


Line 240: Line 312:




If everything is there, be sure to click on “Y” in pushQ for both the main pushQ and all the tracks in the sub-pushQ. Note that you can quickly change the values for all the tracks in the sub-pushQ by accessing the database (qapushq) directly from mysqlbeta:
If everything is there, be sure to click on “Y” in pushQ for both the main pushQ and all the tracks in the sub-pushQ. Note that you can quickly change the values for all the tracks in the sub-pushQ by accessing the database (qapushq) directly from hgwbeta:
 
hgsql -h hgwbeta qapushq
  mysql> UPDATE $db SET makeDocYN="Y";
  mysql> UPDATE $db SET makeDocYN="Y";


===Run featureBits to verify that the gold and gap tables together cover the entire genome===
'''Run featureBits to verify that the gold and gap tables together cover the entire genome'''
Run:
Run:
   featureBits -countGaps -or $db gold gap
   featureBits -countGaps -or $db gold gap
to make sure that the gold and gap table together cover the entire genome (should be 100%).
to make sure that the gold and gap table together cover the entire genome (should be 100%).


===Check to make sure that none of the table names have underscores(_)===
'''Check to make sure that none of the table names have underscores(_)'''
There are some older tables that have underscores (all_est and all_mrna) -- these are OK. What is definitely *not* OK is for split tables (tables that start with chr) to have more than one underscore in their name. Run the two queries below and verify that the only returned results follow these rules:
There are some older tables that have underscores (all_est and all_mrna) -- these are OK. What is definitely *not* OK is for split tables (tables that start with chr) to have more than one underscore in their name. Run the two queries below and verify that the only returned results follow these rules:


Line 255: Line 327:
  mysql > show tables like "%\_%\_%";
  mysql > show tables like "%\_%\_%";


===Make sure that there is a liftOver file from the previous assembly to this assembly===
'''Make sure that there is a liftOver file from the previous assembly to this assembly'''
This is the number one request after a new release. These files are located here:
This is the number one request after a new release. These files are located here:


  /gbdb/[from database]/liftOver/[from database]To[to database].over.chain.gz
  /gbdb/[from database]/liftOver/[from database]To[to database].over.chain.gz


===Verify blastTabs are updated if needed===
'''Verify blastTabs are updated if needed'''
If the new assembly is an update to the human, mouse, rat, zebrafish, D. melanogaster, C. elegans, or S. cerevisiae genomes, make sure that the appropriate blastTab tables to this assembly are built. More information about blastTabs can be found [[BlastTabs|here]].
If the new assembly is an update to the human, mouse, rat, zebrafish, D. melanogaster, C. elegans, or S. cerevisiae genomes, make sure that the appropriate blastTab tables to this assembly are built. More information about blastTabs can be found [[BlastTabs|here]].


===Review all tracks in the sub-pushQ===
'''Review all tracks in the sub-pushQ'''
Make sure to run doGenbankTests to check genbank tracks in the pushQ. If there are Ensembl Genes also run qaEnsGenes.csh.
Follow the [[New_track_checklist]].  Note that many items in the checklist are run for all tracks at once as part of these instructions: joinerCheck, makedoc check, verification of downloads, etc. If there are Ensembl Genes also run [[Ensembl_QA | qaEnsGenes.csh]].  Note that the Short Match and Restriction Enzymes tracks have no tables affiliated with them and do not need to be QAed (they are automatically generated when the Browser is created).


At this point you can also do the chain/nets (if any) from other assemblies to this one, or you can elect to do them after the assembly is released.
At this point you can also do the chain/nets (if any) from other assemblies to this one, or you can elect to do them after the assembly is released.


===Check that all of the MySQL tables are in good repair===
'''Check that all of the MySQL tables are in good repair'''
To do this run:
To do this run:


  hgwbeta > sudo dbCheck.sh $db
  hgwdev > sudo dbCheck.sh $db


This will do a myisamchk on all tables in that $db and repair any that need repairing (noted in the output by the words "REPAIR needed").
This will do a myisamchk on all tables in that $db and repair any that need repairing (noted in the output by the words "REPAIR needed").


===Check link to NCBIs Assembly database on hgGateway page===
'''Check link to NCBIs Assembly database on hgGateway page'''
As of January 2012, NCBI has a new database for Assemblies. From this point forward, we should start linking to that database from the gateway page (our assembly description.html page). Please make sure that there is a link to the exact assembly.version that was used to make this browser.
As of January 2012, NCBI has a new database for Assemblies. From this point forward, we should start linking to that database from the gateway page (our assembly description.html page). Please make sure that there is a link to the exact assembly.version that was used to make this browser.
http://www.ncbi.nlm.nih.gov/genome/assembly/organism/
http://www.ncbi.nlm.nih.gov/assembly/organism/


===Check default position and default tracks are scientifically interesting and aesthetically pleasing===
'''Check default position and default tracks are scientifically interesting and aesthetically pleasing'''
From the gateway page, press 'Click here to reset'. Go back to your assembly, then press 'submit'. You will be taken to the default position for your assembly. Make sure that the resulting area is scientifically interesting and aesthetically pleasing! You can edit the default location here: hgcentralbeta.dbDb.defaultPos and the default tracks here: /kent/src/hg/makeDb/trackDb/$db/trackDb.ra.
From the gateway page, press 'Click here to reset'. Go back to your assembly, then press 'submit'. You will be taken to the default position for your assembly. Make sure that the resulting area is scientifically interesting and aesthetically pleasing! You can edit the default location here: hgcentralbeta.dbDb.defaultPos and the default tracks here: /kent/src/hg/makeDb/trackDb/$db/trackDb.ra.


===Check Blat and PCR===
'''Check Blat and PCR'''
Check that you can do DNA and protein blat as well as PCR on the assembly
Check that you can do DNA and protein blat as well as PCR on the assembly


===Verify downloads===
'''Verify downloads'''
The downloads are located at:
The downloads are located at:


Line 299: Line 371:
will not be present on hgwdev.  They are generated automatically and rsync'ed to hgdownload after an assembly is added to hgwbeta.dbs and "make etc-update-server" is run in the kent/src/hg/makeDb/genbank/ directory on hgwbeta.
will not be present on hgwdev.  They are generated automatically and rsync'ed to hgdownload after an assembly is added to hgwbeta.dbs and "make etc-update-server" is run in the kent/src/hg/makeDb/genbank/ directory on hgwbeta.


====Check that the permissions are group 'genecats' writable (permissions should be at least 664)====
'''Check that the permissions are group 'genecats' writable (permissions should be at least 664)'''
The developer who created this assembly will probably be the owner of the directory and the files in it; you may need to ask him/her to change the permissions.
The developer who created this assembly will probably be the owner of the directory and the files in it; you may need to ask him/her to change the permissions.


====Check the md5sum====
'''Check the md5sum'''
Check the md5sum against the md5sum.txt file for each directory you are planning to push. Note that the md5sum.txt in the liftOver directory may need to be edited (at least temporarily) to include only the liftOver files contained in the pushQ.
Check the md5sum against the md5sum.txt file for each directory you are planning to push. Note that the md5sum.txt in the liftOver directory may need to be edited (at least temporarily) to include only the liftOver files contained in the pushQ.


Line 309: Line 381:
The sort is done with the assumption that the md5sum.txt file is sorted (it typically is).  If the md5sum.txt file is not sorted, the sort is unnecessary.  The temp file is created in your home directory to avoid creating temp files in htdocs-hgdownload.  Note that the md5sum.txt file obviously does not contain md5sum.txt and it was created before there was a README.txt file, so your diff will show md5sum.txt and README.txt in the results.  If everything is ok, those should be the only results.  In the vs* directories, the XXXX.net.axt.gz file will show up as axtNet/XXXX.net.axt.gz.  This is ok as long as the md5sums match.
The sort is done with the assumption that the md5sum.txt file is sorted (it typically is).  If the md5sum.txt file is not sorted, the sort is unnecessary.  The temp file is created in your home directory to avoid creating temp files in htdocs-hgdownload.  Note that the md5sum.txt file obviously does not contain md5sum.txt and it was created before there was a README.txt file, so your diff will show md5sum.txt and README.txt in the results.  If everything is ok, those should be the only results.  In the vs* directories, the XXXX.net.axt.gz file will show up as axtNet/XXXX.net.axt.gz.  This is ok as long as the md5sums match.


====Read and verify READMEs====
'''Check the files for corruption'''
Check that the files in each bigZips, liftOver, vsXXX don't contain weird characters. Run the following in each directory and check the output:
for file in *; do zcat $file | head; zcat $file | tail; done
 
Scroll the output and make sure all the text is ASCII. If there are any issues alert the developer.
 
'''Read and verify READMEs'''
Check that we have READMEs at top level, and for bigZips, chromosomes, liftOvers and comparatives (multiz, phastCons, vsXXX). Verify that the information in the READMEs is correct. Note that some of the files mentioned in the README are generated by the Genbank process, so they won't be present yet.
Check that we have READMEs at top level, and for bigZips, chromosomes, liftOvers and comparatives (multiz, phastCons, vsXXX). Verify that the information in the READMEs is correct. Note that some of the files mentioned in the README are generated by the Genbank process, so they won't be present yet.


The genbank process will build the upstream* files on hgdownload if they don't exist, or are more than seven days old, with whatever genePred table is defined in etc/genbank.conf (e.g. hg16.upstreamGeneTbl = refGene ).  Make sure that the README for the upstream* files reflects the genePred table listed for this assembly.
The genbank process will build the upstream* files on hgdownload if they don't exist, or are more than seven days old, with whatever genePred table is defined in etc/genbank.conf (e.g. hg16.upstreamGeneTbl = refGene ).  Make sure that the README for the upstream* files reflects the genePred table listed for this assembly.


==Stage assembly on Round Robin==
'''Stage assembly on Round Robin'''


===Figure out if either you or Donna is going to edit the static docs===
'''Figure out if either you or Donna is going to edit the static docs'''
Email Donna and let her know that you're going to be releasing the assembly soon. Sometimes Donna does the docs and sometimes the QA person in charge of the assembly will do them. If you are going to edit them, see the [[Static_content_for_new_assemblies|static content for new assemblies page]].
Email Donna and let her know that you're going to be releasing the assembly soon. Sometimes Donna does the docs and sometimes the QA person in charge of the assembly will do them. If you are going to edit them, see the [[Static_content_for_new_assemblies|static content for new assemblies page]].


===Make sure no tables need to be repushed from hgwdev to hgwbeta===
'''Make sure no tables need to be repushed from hgwdev to hgwbeta'''
You can use hgwdev > updateTimesDb.csh to compare table update times between hgwdev and hgwbeta. Everything but hgFindSpec, history, tableDescriptions, trackDb and the genbank tables should have the same update times. To see all of the tables in the assembly that are related to genbank do this:
You can use  
hgwdev > updateTimesDb.sh -d $db
 
to compare table update times between hgwdev and hgwbeta. Everything but hgFindSpec, history, tableDescriptions, trackDb and the genbank tables should have the same update times.  
 
If there are tables that need repushing, make a list like so:
  updateTimesDb.sh -d $db | awk '{if ($6) print $1;}'> tablesToRepush
 
To see all of the tables in the assembly that are related to genbank do this:
  hgwdev > hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls
  hgwdev > hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls


===Double-check /gbdb files===
'''Double-check /gbdb files'''
Double-check that hgnfs1 (which is /gbdb on hgwbeta) has the files listed in the push queue.  Remove any unnecessary files.  If any files were updated on hgwdev in the course of QAing tracks, make sure the correct version is on hgnfs1.
Double-check that hgnfs1 (which is /gbdb on hgwbeta) has the files listed in the push queue.  Remove any unnecessary files.  If any files were updated on hgwdev in the course of QAing tracks, make sure the correct version is on hgnfs1. To see the timestamps of files with symlinks on hgwdev, use the options "-lL" with ls.  The large "L" shows information for a file that a link references, rather than for the link itself.  You may notice that hgwdev has TRIX .ix and .ixx files, that is OK. These live on beta in a different location, /data/trix, more information at the [http://genomewiki.ucsc.edu/genecats/index.php/Pushing_trackDb at the TrackDb page]  Also files on hgwdev are double the size as beta, so don't worry that.
 
'''Make public'''
Go to kent/src/hg/makeDb/trackDb and make public for your organism, then check on hgwbeta-public to make sure everything is working as expected.


===Request rsync of entire database from push-request===
'''Request rsync of entire database from push-request'''
Request an rsync of the entire database from mysqlbeta to mysqlrr. After the push is complete, ask for a drop of trackDb_public and hgFindSpec_public from mysqlrr, and then for a push of trackDb and friends. This will get the correct trackDb and trix files to the RR, after which tracks should appear and Track Search should work.
Request an rsync of the entire database from hgwbeta to mysqlrr '''and genome-euro'''. After the push is complete, ask for a drop of trackDb_public and hgFindSpec_public from mysqlrr '''and genome-euro''', and then for a push of trackDb and friends. This will get the correct trackDb and trix files to the RR, after which tracks should appear and Track Search should work.


Note that if at any point you need to re-push some genbank tables, you must push ALL genbank tables together.
Note that if at any point you need to re-push some genbank tables, you must push ALL genbank tables together.


===Mark main pushQ entry as "push requested"===
'''Request push of hgFixed tables, if needed'''
 
Some tracks (e.g., Ensembl Genes) have tables in the hgFixed database (e.g., trackVersion).  Check whether this table needs to be pushed from hgwbeta -> mysqlrr, and if so, request a push.
 
'''Mark main pushQ entry as "push requested"'''


===Turn on genbank updates on the RR===
'''Turn on genbank updates on the RR'''
Follow the same instructions [[Releasing_an_assembly#Turn_on_GenBank_updates|here]] except edit rr.dbs instead
Follow the same instructions [[Releasing_an_assembly#Turn_on_GenBank_updates|here]] except edit rr.dbs instead.
Make sure after checking in rr.dbs that before you run "make install-rr" that your libs is up to date by doing a "make libs" first.


===Request dump and autodump of database===
cd ~/kent/src ; make libs
Ask the pushers to dump the mysql tables from the RR to .txt.gz and .sql files on hgdownload:/usr/local/apache/htdocs/goldenPath/$db/database, and to start the weekly autodump for this database.


===Adjust the release log in main pushQ===
Important: Be sure to check that the genbank tables are properly updating the Monday after starting genbank updates.
Compile a list of the tracks being released on this assembly and paste it into the release log box of the main pushQ entry for the initial release of the assembly. You can fetch the list from the assembly pushQ, but note that for the genbank tracks you will need to get the names manually. Alternatively, copy and paste the shortLabels of tracks from hgTracks. Look at previous [http://genome.ucsc.edu/goldenPath/releaseLog.html release log] entries for formatting.


===Ensure the active column is set to 0 in the lines you are going to load into hgcentral.dbDb ===
'''Request dump and autodump of database'''
Ask the pushers to dump the mysql tables from the RR to .txt.gz and .sql files on hgdownload:/usr/local/apache/htdocs/goldenPath/$db/database, and to start the weekly autodump for this database. ​
 
Be aware that ​genome-mysql syncs with hgdownload every night, so if you request the autodump, then genome-mysql will automatically sync that night anything new. If you request the autodump, and then want genome-mysql to be sync'd right away, without waiting until the nightly sync, then you could do a push request right then.  If you do the autodump request then wait a day, genome-mysql will already be up to date with your new assembly. If this is ok, the push request to "make the [$db] database available on genome-mysql.." is not needed.
 
'''Adjust the release log in main pushQ'''
Compile a list of the tracks being released on this assembly and paste it into the release log box of the main pushQ entry for the initial release of the assembly. You can fetch the list from the assembly pushQ, but note that for the genbank tracks you will need to get the names manually. Alternatively, copy and paste the shortLabels of tracks from hgTracks. Look at previous [http://genome.ucsc.edu/goldenPath/releaseLog.html release log] entries for formatting.  (Most older entries include "Downloads" in this list, but QAers agreed it is unnecessary to keep track of them going forward.)
 
'''Ensure the active column is set to 0 in the lines you are going to load into hgcentral.dbDb'''
The active column dictates whether the assembly appears in the drop-down menu on the gateway page. When it equals 0, it doesn't show in the pull down, when it equals 1, it does show. Change this to be 0 so that you can test the assembly on the RR without it being directly available to the public.
The active column dictates whether the assembly appears in the drop-down menu on the gateway page. When it equals 0, it doesn't show in the pull down, when it equals 1, it does show. Change this to be 0 so that you can test the assembly on the RR without it being directly available to the public.


===Update hgcentral===
'''Update hgcentral'''
Note that the new assembly will start appearing in drop-downs as soon as hgcentral is updated, so be ready to test things (such as BLAT and PCR) soon after this step.  Do NOT set active=1 in dbDb just yet.
Note that the new assembly will start appearing in drop-downs as soon as hgcentral is updated, so be ready to test things (such as BLAT and PCR) soon after this step.  Do NOT set active=1 in dbDb just yet.  Also note that there may be a line in hgcentral dbDb already if a line was a required for another track such as a multi-species alignment. You can check with copyHgcentral.


Use the files you created to transfer the appropriate lines from hgcentraltest to hgcentralbeta, to also transfer those lines to the hgcentral database on the RR following the rules below. This is preferred, since we have verified that the lines are indeed correct since they worked on hgcentralbeta.
Use copyHgcentral to copy the appropriate tables from hgcentralbeta to hgcentral:


You'll transfer the lines for tables:
* blatServers
* dbDb
* dbDb
* blatServers
* defaultDb (ONLY if this is the first assembly for an organism, otherwise this will be done later)
* genomeClade
* genomeClade (if needed)
* gdbPdb (if needed)
* liftOverChain (if needed)
* liftOverChain (if needed)


Do NOT update defaultDb yet, as the assembly is not active.
See [[Releasing_an_assembly#Update_hgcentralbeta|Updating hgcentralbeta]] to see how to update hgcentral.
 
You can log into hgcentral on the RR like so:
 
hgwdev > hgsql -h genome-centdb hgcentral
 
See [[Releasing_an_assembly#Update_hgcentralbeta|Updating hgcentralbeta]] to see how to load the files into hgcentral.


It is only necessary to edit genomeClade if this is the first assembly for this species or if the order of species was changed. It is only necessary to update gdbPdb for assemblies that are being released with knownGenes. Also, note that it is ok for hgNearOk in dbDb to equal 1 for an older assembly of same organism.
It is only necessary to edit genomeClade if this is the first assembly for this species or if the order of species was changed. Also, note that it is ok for hgNearOk in dbDb to equal 1 for an older assembly of same organism. Also, if you updated the defaultPosition, you'll need to update that in dbDb.


Update the liftOver table so that it has all the liftOvers FROM this assembly to other assemblies. You will add the other liftOver lines when you have finished pushing the chain/nets for the other organisms.
Update the liftOverChain table so that it has all the liftOvers FROM this assembly to other assemblies. You will add the other liftOver lines when you have finished pushing the chain/nets for the other organisms.


===Test the assembly tracks plus BLAT, PCR, liftOver on the RR===
'''Test the assembly tracks plus BLAT, PCR, liftOver on the RR'''
It is possible to test the tracks on the new assembly with active=0 by forcing db=$db in the hgTracks URL. First view an older assembly, then edit the URL so that you are actually viewing your new assembly.
It is possible to test the tracks on the new assembly with active=0 by forcing db=$db in the hgTracks URL. First view an older assembly, then edit the URL so that you are actually viewing your new assembly.


==Enable Assembly on RR and post-release follow-up==
'''Enable Assembly on RR and post-release follow-up'''


===Set active=1 in hgcentral===
'''Set active=1 in hgcentral'''


When everything is working as expected, set the assembly to active:
When everything is working as expected, set the assembly to active:
Line 381: Line 474:
Note that this means that everyone can now see the assembly on the RR.
Note that this means that everyone can now see the assembly on the RR.


===Update defaultDb in hgcentral===
'''Update defaultDb in hgcentral'''
Set your assembly as the default assembly for this organism. If this was a human or mouse assembly, go back and update hgcentraltest and hgcentralbeta too.
Set your assembly as the default assembly for this organism. If this was a human or mouse assembly, go back and update hgcentraltest and hgcentralbeta too. Note there is a question of ''asking whether the data available is better on the previous assembly'' at times.


===Verify again that everything is working as expected on the RR===
'''Verify that everything is working as expected on the RR'''
Look briefly at the tracks, default position, gateway page, etc.
Look briefly at the tracks, default position, gateway page, etc.


===Push Downloads from hgwdev to hgdownload===
'''Verify that everything is working on genome-euro'''
 
Check to be sure that all of the tables, files (including images), and hgcentral changes have appeared on genome-euro, and that the new assembly is working as expected.
 
There is a cron job (owned by the admins) that runs once an hour on genome-euro that rsyncs all of the hgcentral tables from the RR that (1) have newer times on the RR than on genome-euro, and (2) are not on an exclude list (which currently consists of gbMembers, hubStatus, namedSessionDb, sessionDb, and userDb). Then it runs a mysql concat command appends the ".soe.ucsc.edu" path to all of the hostnames in the blatServers table that don't already have it.
 
However, if you want to see your hgcentral changes on genome-euro right away, you can ssh to genome-euro as qateam and run the script yourself:
 
  ssh qateam@genome-euro.ucsc.edu
  sudo /root/pullHgcentral
 
After the script runs (either via cron or by logging in and running it as qateam), genome-euro should behave the same as the RR.
Note: If needed, ssh to qateam@hgwdev and then from that qateam account ssh to qateam@genome-euro.
 
'''Push Downloads from hgwdev to hgdownload'''
These files are pushed directly from hgwdev: /usr/local/apache/'''htdocs-hgdownload'''/goldenPath/$db/ to hgdownload: /usr/local/apache/'''htdocs'''/goldenPath/$db/. Be sure to specify this in your push request. Ask the pushers to be sure to keep the permissions group 'genecats' writable. Make sure to only push the directories that are applicable to the tracks that are in the pushQ. After they are pushed, check that everything is there.
These files are pushed directly from hgwdev: /usr/local/apache/'''htdocs-hgdownload'''/goldenPath/$db/ to hgdownload: /usr/local/apache/'''htdocs'''/goldenPath/$db/. Be sure to specify this in your push request. Ask the pushers to be sure to keep the permissions group 'genecats' writable. Make sure to only push the directories that are applicable to the tracks that are in the pushQ. After they are pushed, check that everything is there.


===Update or add symlink in /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes===
'''Update or add symlink in /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes'''
Update or add a symlink to hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes so that it points to the most recent assembly. In the currentGenomes directory:
Update or add a symlink to hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes so that it points to the most recent assembly. In the currentGenomes directory:
   rm Name_of_symlink
   rm Name_of_symlink
   ln -s ../$db Name_of_symlink
   ln -s ../$db Name_of_symlink


Request a push '''of the symlink''' from hgwdev to hgdownload. This is for ftp users who only want to go to the most recent assembly for an organism. After it is pushed, check that it is functioning correctly on the [ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/ current genomes ftp page].
Request a push '''of the symlink''' from hgwdev to hgdownload. This is for ftp users who only want to go to the most recent assembly for an organism. After it is pushed, check that it is functioning correctly on the [ftp://hgdownload.soe.ucsc.edu/goldenPath/currentGenomes/ current genomes ftp page].
 
'''Ask push-request to make this assembly available on genome-mysql'''
genome-mysql syncs with hgdownload every night, so if you have already requested the autodump, then genome-mysql will automatically sync that night anything new. If the autodump was completed at least 1 day ago, your new assembly should be available on genome-mysql, and a push request is not needed. If you do not want to wait 1 day for the nightly sync, you can request that the admins "make the $db database available on genome-mysql."


===Ask push-request to make this assembly available on genome-mysql===
In the past, we would also request, "Links and​ ​permissions should be made for user, "genome" and "genomep".​ ​(Jorge says to follow​ ​the instructions in the wiki page for​ ​"Mirror_Server".)​"​  This information is no longer needed in the push request.
Notify cluster-admin that the new assembly is available and needs to be released to genome-mysql. Permissions should be made for users "genome" and "genomep". The admins also need to update the mysql.db table permissions. (Jorge says we can ask them to follow the instructions in their wiki for "Mirror_Server".)


===Push Static Content from hgwbeta and Round Robin===
'''Push Static Content from hgwbeta and Round Robin'''


First make sure that either you or Donna have edited the pages below:
First make sure that either you or Donna have edited the pages below:
Line 411: Line 520:
Then push them from hgwbeta to the RR and hgwbeta-public. For more information go to: [[Static_content_for_new_assemblies]]
Then push them from hgwbeta to the RR and hgwbeta-public. For more information go to: [[Static_content_for_new_assemblies]]


===Announce the Release on genome-announce@soe.ucsc.edu===
'''Announce the Release on genome-announce@soe.ucsc.edu'''
Whoever edited the static docs should send announcements to: genome-announce mailing list (genome-announce@soe.ucsc.edu). It is best to take the section from the news page and edit it as needed for an email.
Whoever edited the static docs should send announcements to: genome-announce mailing list (genome-announce@soe.ucsc.edu). It is best to take the section from the news page and edit it as needed for an email.


===Update hgcentral.sql for mirrors===
'''(skip/optional) Update hgcentral.sql for mirrors'''
This will provide the most up-to-date version of hgcentral for mirrors.  Ask the buildmeister to update the file at http://hgdownload.cse.ucsc.edu/admin/.
Requesting this update will provide the most up-to-date version of hgcentral for mirrors.  If an immediate update is important, ask the buildmeister to update the file at http://hgdownload.soe.ucsc.edu/admin/.  In a recent discussion, it was decided is that it's really not necessary to do this step when we release a new assembly.  We can just let the update go out with the 3-week release cycle.


Optional: Some mirrors like to get hgcentral tables via ftp or rsync from ftp://hgdownload.cse.ucsc.edu/mysql/hgcentral/.  You can request a push from hgnfs1 --> hgdownload if you want to make the hgcentral info available there immediately (otherwise it will be copied there by a weekly rsync).
Also Optional: Some mirrors like to get hgcentral tables via ftp or rsync from ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral/.  You can request a push from hgnfs1 --> hgdownload if you want to make the hgcentral info available there immediately (otherwise it will be copied there by a weekly rsync).


===You may need to update all.joiner for the RR===
'''You may need to update all.joiner for the RR'''
Relationships among tables are defined in all.joiner, which is updated on the RR when we push CGIs. If all.joiner was edited during QA of the tracks, you may want to ask for a push of all.joiner from hgwbeta (/usr/local/apache/cgi-bin/all.joiner) to the RR so that the table relationships show up in the table browser.
Relationships among tables are defined in all.joiner, which is updated on the RR when we push CGIs. If all.joiner was edited during QA of the tracks, you may want to ask for a push of all.joiner from hgwbeta (/usr/local/apache/cgi-bin/all.joiner) to the RR so that the table relationships show up in the table browser.


==Chain/Nets/LiftOvers from other organisms==
'''Chain/Nets/LiftOvers from other organisms'''
The timing of this step is not critical.  It can be done anytime after the new assembly is active on the RR. These steps can be done in either order:
The timing of this step is not critical.  It can be done anytime after the new assembly is active on the RR. These steps can be done in either order:


Line 428: Line 537:
* Add the appropriate lines to the hgcentral.liftOverChain table (using hgsql -h genome-centdb) so that hgLiftOver and hgConvert works from other organisms to the new assembly.  Test hgLiftOver and hgConvert.  (Do not delete any old lines from liftOverChain . . . liftOver should still work for older assemblies.)
* Add the appropriate lines to the hgcentral.liftOverChain table (using hgsql -h genome-centdb) so that hgLiftOver and hgConvert works from other organisms to the new assembly.  Test hgLiftOver and hgConvert.  (Do not delete any old lines from liftOverChain . . . liftOver should still work for older assemblies.)


==Next day follow-up==
'''Next day follow-up'''


===Check that Genbank is running on the RR===
'''Check that Genbank is running on the RR'''
Make sure Genbank daily updates are running on Round Robin. You can do this by viewing the dates on the download files in htdocs/goldenPath/$db/bigZips/ (they should be more recent than the ones you pushed with your release). Also check that the table gbLoaded is getting updated on the RR with:
Make sure Genbank weekly updates are running on Round Robin. You can do this by viewing the dates on the download files in htdocs/goldenPath/$db/bigZips/ (they should be more recent than the ones you pushed with your release). Also check that the table gbLoaded is getting updated on the RR with:
   realTime.csh $db gbLoaded
   realTime.csh $db gbLoaded
The RR updates every Sunday, so check this the Monday after you release.


===Check the dump of the database in the downloads===
'''Check the dump of the database in the downloads'''
Look in the database download directory and verify that the dump occurred.
Look in the database download directory and verify that the dump occurred.
   
   
===Check that the downloads files generated by the genbank process are on hgdownload===
'''Check that the downloads files generated by the genbank process are on hgdownload'''
Make sure that the files generated by genbank that are mentioned in the bigZips/README file, such as est.fa.gz, mrna.fa.gz, etc., and the upstream* files, are actually present on hgdownload.
Make sure that the files generated by genbank that are mentioned in the bigZips/README file, such as est.fa.gz, mrna.fa.gz, etc., and the upstream* files, are actually present on hgdownload.
   
   
===Check that genome-mysql is working===
'''Check that genome-mysql is working'''
From hgwdev:
From hgwdev:
   mysql -h genome-mysql -A -u genome $db
   mysql -h genome-mysql -A -u genome $db


===Retire the assembly sub-pushQ===
'''Retire the assembly sub-pushQ'''
Make sure there are release log entries for the net and chain tracks in other databases.  
Make sure there are release log entries for the net and chain tracks in other databases.  


Line 452: Line 562:




==Press "done!" in the main push queue==
'''Press "done!" in the main push queue'''
This will update the release log.
This will update the release log.


===Check the release log===
'''Check the release log'''
The day after you press “done!” in the main push queue for your assembly, the Release Log on the website will be updated with the information about the new release (from whatever you entered into the Release Log field of the main push queue). Verify that this happened on the [http://genome.ucsc.edu/goldenPath/releaseLog.html release log] page. To edit an entry, find it in the Log section of the push queue and edit there.  Changes will appear in the release log the next day.
The day after you press “done!” in the main push queue for your assembly, the Release Log on the website will be updated with the information about the new release (from whatever you entered into the Release Log field of the main push queue). Verify that this happened on the [http://genome.ucsc.edu/goldenPath/releaseLog.html release log] page. To edit an entry, find it in the Log section of the push queue and edit there.  Changes will appear in the release log the next day.



Latest revision as of 23:57, 4 June 2019

WARNING: Some parts of this "Releasing an assembly" section are out of date.

Use newer version of these assembly QA & Release steps:


Link to new wiki instructions: Assembly Release QA Steps (new 2017 steps)

  • The new instructions include a Google spreadsheet checklist that is sync'd with the new wiki.

* BELOW ARE OLD AND OUTDATED ASSEMBLY RELEASE INSTRUCTIONS.
* PLEASE ADD UPDATES TO THE NEW ASSEMBLY RELEASE INSTRUCTIONS INSTEAD.

Pre-staging of assembly on hgwbeta


""Put your name as the reviewer in main pushQ, and claim the Redmine ticket That way people know you're working on it.

Check if chromosome sizes have changed significantly (batches*)

  • If you are releasing an update to an assembly, check to see if chromosome sizes have changed significantly. Report any significant changes to the developer.
  • Output chromosome sizes from the old and new assemblies into two files and compare them
hgwdev > hgsql -Ne "select chrom, size from chromInfo" $oldDb > oldChromSizes 
hgwdev > hgsql -Ne "select chrom, size from chromInfo" $newDb > newChromSizes 
hgwdev > sdiff -s oldChromSizes newChromSizes

* Steps marked with (batches) should be done for all assemblies in the pushQ at once. Make a note in the redmine issues of the other assemblies that you're working on the pre-staging and staging steps and then make another note once you're finished. This will hopefully speed up the release of the many assemblies on the horizon.

Check that any chain/net/liftOvers listed in the pushQ are to valid assemblies on the RR (batches) If your assembly has a chain/net/liftOver to/from an assembly that is *not* on the RR (and not in the pushQ as another new assembly), you do not need to QA them or push them to the RR. Drop the relevant row(s) from your sub-pushQ by going to the track entry, clicking lock and then clicking the delete button. (If it would be helpful to see all of the other-organism liftover files at once, cd to /gbdb on hgwdev and use this command: ls -d */liftOver/*$db* .)

Check to ensure that there are enough tracks to be considered at least a "minimal" browser (batches) Minimal browser

Stage assembly on hgwbeta (batches)

Push tables to hgwbeta

Push the database and tables from hgwdev to hgwbeta

hgwdev> sudo mypush $db '*' hgwbeta
  • Remove the hgFindSpec*, trackDb* tables from hgwbeta:
hgwdev> hgsql -h hgwbeta $db

Then for each table listed above:
mysql> drop table $tablename;

Note: there may be several hgFindSpec and trackDb tables shows as trackDB_someonesname. So the * after those two tables just means delete all tables that begin with that name.

Similarly, you can create a file of the tables you intend to push and then use a simple loop to push them to beta. Be sure to remove the "trackDb_*" and "hgFindSpec_*" tables from your file before pushing them.

For example, your tables file would contain things like:

all_mrna
author
cds
cell
chromInfo
cpgIslandExt
...

Then, you can use the following loop to push the tables to beta:

hgwdev> for $table in `cat $tableList`; do sudo mypush $db $table hgwbeta; done

Alternative method to get db and tables on beta: create database on hgwbeta and push tables for $db

  • Create the database on hgwbeta.
hgwdev >  hgsql -h hgwbeta 
mysql > CREATE DATABASE $db;
  • Create a list of tables to import from hgwdev from the tables listed in the push queue. There should be a table called '$db' in the qapushq database on hgwbeta which can be used to get all of the tables at once:
hgwdev >  hgsql -h hgwbeta -Ne "SELECT tbls FROM $db WHERE dbs='$db'" qapushq > tables

To convert spaces to newlines for the tableList:

awk '{ for (i=1;i<=NF;i++) print $i }' tables > tableList 
  • Push the tables to hgwbeta.
hgwdev > bigPush.csh $db tableList 

bigPush.csh gives size of the push at the end, which you can use to confirm it is "similar" to the original size from hgwdev. You can also compare sizes in the main pushQ by putting a "*" in the tables field, selecting hgwdev from the "Current Location", and then clicking on "show sizes" button.

  • If you create a tableList, you can also use this list to find out which of your tables should be searchable with a command like the following when you get to the step of QAing individual tracks:
for i in $(cat tableList); do hgsql -Ne "select searchName,searchTable,searchMethod,termRegex from hgFindSpec where searchTable like '%$i%';" $db; done

Push chain/net tables in other organisms In the sub-pushQ for the assembly there may be chain/net tracks listed. Push these tracks to hgwbeta. Only push tables for databases that exist on hgwbeta/RR though. (These would not be captured by your earlier SELECT tbls FROM $db WHERE dbs='$db' to form a tableList to push, because it has a different dbs=, likely hg19, for example, but the tables you will be pushing, to say the hg19 database, will be named after your $db)

The tables to push for each listed assembly are:

  • chain$db
  • chain$dbLink
  • net$db

Since you may be pushing the same tables for several $DBs (i.e., assemblies like hg19), it may save time to create a file containing the table names and use bigPush.csh:

hgwdev > bigPush.csh $DBs tableList 

After pushing the tables you will need to make beta in trackDb on hgwbeta for each of the other organisms.

Update hgcentralbeta

You can copy items from hgcentraltest to hgcentralbeta with the copyHgcentral script. For the usage statement, run:

hgwdev > copyHgcentral -h

The copyHgcentral script must be run in test mode first. Test mode will show you the state of hgcentraltest, hgcentralbeta and hgcentral. Once test mode has been run and everything looks good, you can run execute mode to copy from hgcentraltest to hgcentralbeta. Note that test mode generates output files which must be manually deleted afterward. Be sure to run copyHgcentral in your home directory and not in a directory where we don't want temp files to end up.

As an example, if you wanted to copy the contents of blatServers from hgcentraltest to hgcentralbeta for hg19, you would first run test mode with the following command:

hgwdev > copyHgcentral test hg19 blatServers dev beta

This would generate the following output:

--------------------------------------------------
--------------------------------------------------
<<< blatServers >>>

hgcentraltest
-------------
hg19    blat4a  17778   1       0
hg19    blat4a  17779   0       1

hgcentralbeta
-------------
hg19    blat4a  17778   1       0
hg19    blat4a  17779   0       1

hgcentral
-------------
hg19    blat4a  17778   1       0
hg19    blat4a  17779   0       1

*** The blatServers data on dev and beta is identical ***

*** The blatServers data on beta and rr is identical ***

When ready to run execute mode, just replace "test" with "execute" in the command line.

Some important things to note:

  • You can run copyHgcentral on all tables at once using "all" as the table name
  • There is no way to overwrite anything in test mode
  • When running execute mode:
    • If the data is identical between the origin server and the destination server, nothing will be copied
    • If the data differs between the origin server and the destination server, you will be forced to respond before anything is copied. It is impossible to overwrite something here by accident.
    • If copying a dbDb entry from hgcentralbeta to hgcentral, active is automatically set to 0. There is no way to accidentally put a new assembly on the RR with active=1.

dbDb

hgwdev > copyHgcentral test $db dbDb dev beta

Adding a dbDb entry will add the new assembly to the hgGateway page. Examine the copyHgcentral output and make sure that the assembly date is correct under the description column. It should be later than the previous assembly. If it is not, contact the developer.

blatServers

hgwdev > copyHgcentral test $db blatServers dev beta

The developer has often already requested that the blat servers be set up for the new assembly. If not, and/or if entries for your assembly are missing from hgcentraltest.blatServers, please make a note in the Redmine ticket and ask the assembly builder to 1) request the setup of the blat servers and to 2) manually add the entries to hgcentraltest.blatServers.

Make sure that this assembly is not hosted on "blatx" BLAT server. That server is not as stable and therefore is for assemblies that are not destined for the RR. For more information about where the blat servers for different machines should be hosted, go to Updating blat servers.

defaultDb

hgwdev > copyHgcentral test $db defaultDb dev beta

defaultDb controls which assembly is the default in the "assembly" drop-down menu when an organism is chosen from the "genome" drop-down menu in hgGateway (e.g., when "Mouse" is selected from the "genome" menu, defaultDb controls whether mm9 or mm10 is the default in the "assembly" menu). If this is the first assembly for an organism, you will need the defaultDb entry in order for the assembly to appear on hgwbeta.

Do not change the value for defaultDb for human or mouse on hgwbeta. Leave them set to the previous assembly because many people use these assemblies and will be confused if it changes on hgwdev and hgwbeta.

For existing organisms other than human and mouse, most often one should change the defaultDb so that you don't accidentally test the previous assembly. After discussing the idea of when to change a default assembly there is a question of asking whether the data available is better on the previous assembly, that would be a reason to not update the default assembly.

genomeClade

hgwdev > copyHgcentral test $db genomeClade dev beta

genomeClade is used to populate the "genome" drop-down menu in hgGateway and set the order in which organisms are listed in that menu. If this is not the first assembly for an organism, genomeClade will not need updating.

liftOverChain liftOverChain is not copied with the copyHgcentral script. In the future, there will be a copyLiftOverChain script, but for the time being, liftOverChain needs to be copied manually.

Only copy lines from liftOverChain on hgcentraltest to hgcentralbeta if there are liftOver files listed in the pushQ and if the assemblies they go to/from exist on the RR. Check for lines in liftOverChain that should be in the pushQ, but aren't (e.g., the liftOver from a previous assembly). Email the developer and ask them to add them to the pushQ if necessary.

hgsql -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentraltest > chain.dev 

Check beta, load if not present and recheck:

hgsql -h hgwbeta -Ne "SELECT * FROM liftOverChain WHERE fromDb = '$db' OR toDb = '$db'" hgcentralbeta 
hgsql -h hgwbeta -e "LOAD DATA LOCAL INFILE 'chain.dev' INTO TABLE liftOverChain" hgcentralbeta

Check Meta Data After you have completed the steps above, use the script checkMetaData.csh to make sure that all of the metadata is the same on hgwdev and on hgwbeta. Run this script in a temporary folder because it creates several files.

Push /gbdb/$db and html/description.html Extract all of the gbdb files from the pushQ for your org and those for the other orgs as well:

hgsql -h hgwbeta -Ne "SELECT files FROM $db" qapushq > fileList 

Ask for a push of the list of /gbdb files above from hgwdev to hgnfs1 (don't worry about the downloads yet). Remind the pushers that items that are symlinked on hgwdev should become real files on hgnfs1. To see how big these files are:

hgwdev > cd /gbdb/$db
hgwdev > du -hscL `ls -d */liftOver/*$db*` .

Push image file to rr The image file that appears on the gateway page should reside in the kent source tree in:

~/kent/src/hg/htdocs/images/

and a copy should exist at:

hgwdev > /usr/local/apache/htdocs/images/

Confirm that the width of the box around the image in description.html is 15 pixels wider than the image.

Confirm that the gateway text is ok.

If there is a previous assembly, it is possible that it is using the same image on the gateway page. Check on hgwbeta to see if the image is missing. If it isn't, you don't need to ask for the image to be pushed.

To get the image to appear on hgwbeta, do a "make beta" in src/hg/htdocs/. Ask for a push of the file from hgwbeta (at the /usr/local/apache... location) to the RR. It's a good idea to ask for the push of the image to the RR during the staging process, as you will inevitably forget to push it when it's time to release the assembly. If there are any other images for this assembly (for instance, the phylo image that goes with the Conservation track), you can push them too.

Make trackDb on hgwbeta Remake the trackDb on hgwbeta. Will likely need to be done again as track descriptions are updated.

hgwdev> cd kent/src/hg/makeDb/trackDb 
hgwdev> make beta DBS=$db

Turn on GenBank updates The new assembly should already be listed in the files align.dbs and hgwdev.dbs in the source tree at ~/kent/src/hg/makeDb/genbank/etc/. If it is not, check with Brian Raney. If it is, turn on GenBank updates on hgwbeta before 4:30 p.m., when the daily updates start, by adding the new assembly to /kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs in alphabetical order. Do not yet edit the rr.dbs, this comes later when the assembly is on the RR. After committing the change, make sure your libs are up to date:

cd ~/kent/src ; make libs

then go ahead and run the make:

cd ~/kent/src/hg/makeDb/genbank/ 
git pull 
make install-rr install-server

To see whether updates have run (at least the Monday after the *.dbs files were updated), check the update times of the table 'gbLoaded':

hgwdev > updateTimes.csh $db gbLoaded 

The update times will be out of sync between machines, but not by more than 24 hours or so if updates are running. The gbLoaded table will be updated regardless of whether changes to other GenBank tables were picked up. More genbank update instructions are available at Genbank updates.

The etc-update-server part of the make will cause the downloads mentioned below in the "Verify downloads" section to be created.

Review the alphabetic location in pull-down menus on hgGateway Organisms are supposed to be listed in alphabetic order in the pull-down menus on hgGateway. Check to see if your new genome is in the right place alphabetically starting with the first letter. Only Human and Mouse will be out of order as they are supposed to appear at the top of their "group" drop-down. Each letter is allotted 1000 numbers, here is the current arrangement:

1 - 100: Human          9001 - 10000: I           19001 - 20000: S
101 - 200: Mouse        10001 - 11000: J          20001 - 21000: T
1001 - 2000: A          11001 - 12000: K          21001 - 22000: U
2001 - 3000: B          12001 - 13000: L          22001 - 23000: V
3001 - 4000: C          13001 - 14000: M          23001 - 24000: W
4001 - 5000: D          14001 - 15000: N          24001 - 25000: X
5001 - 6000: E          15001 - 16000: O          25001 - 26000: Y 
6001 - 7000: F          16001 - 17000: P          26001 - 27000: Z
7001 - 8000: G          17001 - 18000: Q
8001 - 9000: H          18001 - 19000: R 

You can check your organism's orderKey on beta using the following commands:

hgwdev> hgsql -h hgwbeta hgcentralbeta
mysql > select name, orderKey from dbDb order by orderKey;

Test on hgwbeta

Check the .2bit files

The .2bit files contain the new assembly sequence in a compact, binary format. The .2bit files are located at:

  • /scratch/$db (on the blat server)
  • /usr/local/apache/htdocs-hgdownload/goldenPath/$db/bigZips/ (on hgwdev)
  • /gbdb/$db/ (on hgwdev)
  • /gbdb/$db/ (on hgwbeta)

Check the to make sure that the .2bit files are identical by running the 2bitCompare script. Particularly if the assembly has been part of a multiz track without a Browser, the file may exist on beta and RR and may not have been masked.

Below is some sample output:

hgwdev> 2bitCompare allMis1

  Checking md5sums.  This could take a few minutes.  Please be patient...

        blat4a md5sum: 134e740c05eedadc24de3a96775a25d6 /scratch/allMis1/allMis1.2bit
      download md5sum: 134e740c05eedadc24de3a96775a25d6 /usr/local/apache/htdocs-hgdownload/goldenPath/allMis1/bigZips/allMis1.2bit
   hgwdev gbdb md5sum: 134e740c05eedadc24de3a96775a25d6 /gbdb/allMis1/allMis1.2bit
  hgwbeta gbdb md5sum: 134e740c05eedadc24de3a96775a25d6 /gbdb/allMis1/allMis1.2bit

        blat4a date,size: Jun 19 11:03 569794406
      download date,size: Jul 3 10:55 53
   hgwdev gbdb date,size: Jun 7 13:34 39
  hgwbeta gbdb date,size: Jun 7 13:33 569794406

The first part of the script output lists the md5sums of all four .2bit files. These should be identical.

The second part of the script output lists the timestamps and filesizes.

  • The download and hgwdev gbdb files should be symlinks, as evidenced by a small filesize.
  • The blat and hgwbeta gbdb files should be the actual files, as evidenced by a large filesize.
  • The two symlink filesizes will likely be different, but the filesize of the two actual files should be identical.

If the blat .2bit is not the same as the other .2bit files, ask the pushers to restart the assembly and to pull the newest .2bit file from /gbdb.

joinerCheck

Check that common keys between tables are in sync

hgwdev > cd ~/kent/src/hg/makeDb/schema 
hgwdev > joinerCheck -database=$db -keys all.joiner

If there are errors related to genbank identifiers, it is likely because of the genbank load process, and not an issue with your database. Run joinerCheck once the tables are on beta to confirm:

hgwdev > HGDB_CONF=~/.hg.conf.beta joinerCheck -keys -identifier=$identifier all.joiner

Check that table update times are copacetic

hgwdev > joinerCheck -database=$db -times all.joiner

Check that all tables in this database are mentioned in all.joiner

hgwdev > joinerCheck -database=$db -tableCoverage all.joiner 

If not all of the tables are listed, email the developer asking him to add those tables to the tablesIgnored $db. According to Hiram it is probably ok for us to edit all.joiner ourselves.

Verify makedoc for all the tracks listed in the pushQ The makedoc file should be here: /src/hg/makeDb/doc/$db.txt. Check that all the tracks listed in the pushQ are included.

Tables that probably won't be in the makedoc explicitly because they (usually because they are automatically generated with the initial assembly) are:

  • assembly
  • chromInfo
  • gc5BaseBw
  • grp
  • hgFindSpec
  • history
  • nestedRepeats
  • rmsk
  • simpleRepeat
  • tableDescriptions
  • genbank tables (xenoRefGene, etc)
  • supporting tables


If everything is there, be sure to click on “Y” in pushQ for both the main pushQ and all the tracks in the sub-pushQ. Note that you can quickly change the values for all the tracks in the sub-pushQ by accessing the database (qapushq) directly from hgwbeta:

hgsql -h hgwbeta qapushq
mysql> UPDATE $db SET makeDocYN="Y";

Run featureBits to verify that the gold and gap tables together cover the entire genome Run:

 featureBits -countGaps -or $db gold gap

to make sure that the gold and gap table together cover the entire genome (should be 100%).

Check to make sure that none of the table names have underscores(_) There are some older tables that have underscores (all_est and all_mrna) -- these are OK. What is definitely *not* OK is for split tables (tables that start with chr) to have more than one underscore in their name. Run the two queries below and verify that the only returned results follow these rules:

mysql > show tables like "%\_%";
mysql > show tables like "%\_%\_%";

Make sure that there is a liftOver file from the previous assembly to this assembly This is the number one request after a new release. These files are located here:

/gbdb/[from database]/liftOver/[from database]To[to database].over.chain.gz

Verify blastTabs are updated if needed If the new assembly is an update to the human, mouse, rat, zebrafish, D. melanogaster, C. elegans, or S. cerevisiae genomes, make sure that the appropriate blastTab tables to this assembly are built. More information about blastTabs can be found here.

Review all tracks in the sub-pushQ Follow the New_track_checklist. Note that many items in the checklist are run for all tracks at once as part of these instructions: joinerCheck, makedoc check, verification of downloads, etc. If there are Ensembl Genes also run qaEnsGenes.csh. Note that the Short Match and Restriction Enzymes tracks have no tables affiliated with them and do not need to be QAed (they are automatically generated when the Browser is created).

At this point you can also do the chain/nets (if any) from other assemblies to this one, or you can elect to do them after the assembly is released.

Check that all of the MySQL tables are in good repair To do this run:

hgwdev > sudo dbCheck.sh $db

This will do a myisamchk on all tables in that $db and repair any that need repairing (noted in the output by the words "REPAIR needed").

Check link to NCBIs Assembly database on hgGateway page As of January 2012, NCBI has a new database for Assemblies. From this point forward, we should start linking to that database from the gateway page (our assembly description.html page). Please make sure that there is a link to the exact assembly.version that was used to make this browser. http://www.ncbi.nlm.nih.gov/assembly/organism/

Check default position and default tracks are scientifically interesting and aesthetically pleasing From the gateway page, press 'Click here to reset'. Go back to your assembly, then press 'submit'. You will be taken to the default position for your assembly. Make sure that the resulting area is scientifically interesting and aesthetically pleasing! You can edit the default location here: hgcentralbeta.dbDb.defaultPos and the default tracks here: /kent/src/hg/makeDb/trackDb/$db/trackDb.ra.

Check Blat and PCR Check that you can do DNA and protein blat as well as PCR on the assembly

Verify downloads The downloads are located at:

hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/$db/

Note that you should only push the downloads needed for the tracks in your pushQ. LiftOver files and vs* directories are for the chain/net tracks; and the multiz*way, phastCons*way and phyloP*way directories are for conservation tracks.

Note that $db/database will be empty except for README.txt. This directory will contain a dump of the database on the RR, but will always remain empty on hgwdev.

Note one more thing. These files:

est.fa.gz      mrna.fa.gz      refMrna.fa.gz      xenoMrna.fa.gz
est.fa.gz.md5  mrna.fa.gz.md5  refMrna.fa.gz.md5  xenoMrna.fa.gz.md5

will not be present on hgwdev. They are generated automatically and rsync'ed to hgdownload after an assembly is added to hgwbeta.dbs and "make etc-update-server" is run in the kent/src/hg/makeDb/genbank/ directory on hgwbeta.

Check that the permissions are group 'genecats' writable (permissions should be at least 664) The developer who created this assembly will probably be the owner of the directory and the files in it; you may need to ask him/her to change the permissions.

Check the md5sum Check the md5sum against the md5sum.txt file for each directory you are planning to push. Note that the md5sum.txt in the liftOver directory may need to be edited (at least temporarily) to include only the liftOver files contained in the pushQ.

An easy way to compare the md5sum with md5sum.txt is to do a diff. This can be easily automated by running the following command in each directory:

hgwdev > md5sum * | sort > ~/[filename]; diff md5sum.txt ~/[filename]

The sort is done with the assumption that the md5sum.txt file is sorted (it typically is). If the md5sum.txt file is not sorted, the sort is unnecessary. The temp file is created in your home directory to avoid creating temp files in htdocs-hgdownload. Note that the md5sum.txt file obviously does not contain md5sum.txt and it was created before there was a README.txt file, so your diff will show md5sum.txt and README.txt in the results. If everything is ok, those should be the only results. In the vs* directories, the XXXX.net.axt.gz file will show up as axtNet/XXXX.net.axt.gz. This is ok as long as the md5sums match.

Check the files for corruption Check that the files in each bigZips, liftOver, vsXXX don't contain weird characters. Run the following in each directory and check the output:

for file in *; do zcat $file | head; zcat $file | tail; done

Scroll the output and make sure all the text is ASCII. If there are any issues alert the developer.

Read and verify READMEs Check that we have READMEs at top level, and for bigZips, chromosomes, liftOvers and comparatives (multiz, phastCons, vsXXX). Verify that the information in the READMEs is correct. Note that some of the files mentioned in the README are generated by the Genbank process, so they won't be present yet.

The genbank process will build the upstream* files on hgdownload if they don't exist, or are more than seven days old, with whatever genePred table is defined in etc/genbank.conf (e.g. hg16.upstreamGeneTbl = refGene ). Make sure that the README for the upstream* files reflects the genePred table listed for this assembly.

Stage assembly on Round Robin

Figure out if either you or Donna is going to edit the static docs Email Donna and let her know that you're going to be releasing the assembly soon. Sometimes Donna does the docs and sometimes the QA person in charge of the assembly will do them. If you are going to edit them, see the static content for new assemblies page.

Make sure no tables need to be repushed from hgwdev to hgwbeta You can use

hgwdev > updateTimesDb.sh -d $db

to compare table update times between hgwdev and hgwbeta. Everything but hgFindSpec, history, tableDescriptions, trackDb and the genbank tables should have the same update times.

If there are tables that need repushing, make a list like so:

 updateTimesDb.sh -d $db | awk '{if ($6) print $1;}'> tablesToRepush

To see all of the tables in the assembly that are related to genbank do this:

hgwdev > hgsql -Ne 'show tables' $db | egrep -f /cluster/data/genbank/etc/genbank.tbls

Double-check /gbdb files Double-check that hgnfs1 (which is /gbdb on hgwbeta) has the files listed in the push queue. Remove any unnecessary files. If any files were updated on hgwdev in the course of QAing tracks, make sure the correct version is on hgnfs1. To see the timestamps of files with symlinks on hgwdev, use the options "-lL" with ls. The large "L" shows information for a file that a link references, rather than for the link itself. You may notice that hgwdev has TRIX .ix and .ixx files, that is OK. These live on beta in a different location, /data/trix, more information at the at the TrackDb page Also files on hgwdev are double the size as beta, so don't worry that.

Make public Go to kent/src/hg/makeDb/trackDb and make public for your organism, then check on hgwbeta-public to make sure everything is working as expected.

Request rsync of entire database from push-request Request an rsync of the entire database from hgwbeta to mysqlrr and genome-euro. After the push is complete, ask for a drop of trackDb_public and hgFindSpec_public from mysqlrr and genome-euro, and then for a push of trackDb and friends. This will get the correct trackDb and trix files to the RR, after which tracks should appear and Track Search should work.

Note that if at any point you need to re-push some genbank tables, you must push ALL genbank tables together.

Request push of hgFixed tables, if needed

Some tracks (e.g., Ensembl Genes) have tables in the hgFixed database (e.g., trackVersion). Check whether this table needs to be pushed from hgwbeta -> mysqlrr, and if so, request a push.

Mark main pushQ entry as "push requested"

Turn on genbank updates on the RR Follow the same instructions here except edit rr.dbs instead. Make sure after checking in rr.dbs that before you run "make install-rr" that your libs is up to date by doing a "make libs" first.

cd ~/kent/src ; make libs

Important: Be sure to check that the genbank tables are properly updating the Monday after starting genbank updates.

Request dump and autodump of database Ask the pushers to dump the mysql tables from the RR to .txt.gz and .sql files on hgdownload:/usr/local/apache/htdocs/goldenPath/$db/database, and to start the weekly autodump for this database. ​

Be aware that ​genome-mysql syncs with hgdownload every night, so if you request the autodump, then genome-mysql will automatically sync that night anything new. If you request the autodump, and then want genome-mysql to be sync'd right away, without waiting until the nightly sync, then you could do a push request right then. If you do the autodump request then wait a day, genome-mysql will already be up to date with your new assembly. If this is ok, the push request to "make the [$db] database available on genome-mysql.." is not needed.

Adjust the release log in main pushQ Compile a list of the tracks being released on this assembly and paste it into the release log box of the main pushQ entry for the initial release of the assembly. You can fetch the list from the assembly pushQ, but note that for the genbank tracks you will need to get the names manually. Alternatively, copy and paste the shortLabels of tracks from hgTracks. Look at previous release log entries for formatting. (Most older entries include "Downloads" in this list, but QAers agreed it is unnecessary to keep track of them going forward.)

Ensure the active column is set to 0 in the lines you are going to load into hgcentral.dbDb The active column dictates whether the assembly appears in the drop-down menu on the gateway page. When it equals 0, it doesn't show in the pull down, when it equals 1, it does show. Change this to be 0 so that you can test the assembly on the RR without it being directly available to the public.

Update hgcentral Note that the new assembly will start appearing in drop-downs as soon as hgcentral is updated, so be ready to test things (such as BLAT and PCR) soon after this step. Do NOT set active=1 in dbDb just yet. Also note that there may be a line in hgcentral dbDb already if a line was a required for another track such as a multi-species alignment. You can check with copyHgcentral.

Use copyHgcentral to copy the appropriate tables from hgcentralbeta to hgcentral:

  • blatServers
  • dbDb
  • defaultDb (ONLY if this is the first assembly for an organism, otherwise this will be done later)
  • genomeClade (if needed)
  • liftOverChain (if needed)

See Updating hgcentralbeta to see how to update hgcentral.

It is only necessary to edit genomeClade if this is the first assembly for this species or if the order of species was changed. Also, note that it is ok for hgNearOk in dbDb to equal 1 for an older assembly of same organism. Also, if you updated the defaultPosition, you'll need to update that in dbDb.

Update the liftOverChain table so that it has all the liftOvers FROM this assembly to other assemblies. You will add the other liftOver lines when you have finished pushing the chain/nets for the other organisms.

Test the assembly tracks plus BLAT, PCR, liftOver on the RR It is possible to test the tracks on the new assembly with active=0 by forcing db=$db in the hgTracks URL. First view an older assembly, then edit the URL so that you are actually viewing your new assembly.

Enable Assembly on RR and post-release follow-up

Set active=1 in hgcentral

When everything is working as expected, set the assembly to active:

hgwdev > hgsql -h genome-centdb hgcentral
mysql > UPDATE dbDb SET active = 1 WHERE name = "$db";

Note that this means that everyone can now see the assembly on the RR.

Update defaultDb in hgcentral Set your assembly as the default assembly for this organism. If this was a human or mouse assembly, go back and update hgcentraltest and hgcentralbeta too. Note there is a question of asking whether the data available is better on the previous assembly at times.

Verify that everything is working as expected on the RR Look briefly at the tracks, default position, gateway page, etc.

Verify that everything is working on genome-euro

Check to be sure that all of the tables, files (including images), and hgcentral changes have appeared on genome-euro, and that the new assembly is working as expected.

There is a cron job (owned by the admins) that runs once an hour on genome-euro that rsyncs all of the hgcentral tables from the RR that (1) have newer times on the RR than on genome-euro, and (2) are not on an exclude list (which currently consists of gbMembers, hubStatus, namedSessionDb, sessionDb, and userDb). Then it runs a mysql concat command appends the ".soe.ucsc.edu" path to all of the hostnames in the blatServers table that don't already have it.

However, if you want to see your hgcentral changes on genome-euro right away, you can ssh to genome-euro as qateam and run the script yourself:

 ssh qateam@genome-euro.ucsc.edu
 sudo /root/pullHgcentral

After the script runs (either via cron or by logging in and running it as qateam), genome-euro should behave the same as the RR. Note: If needed, ssh to qateam@hgwdev and then from that qateam account ssh to qateam@genome-euro.

Push Downloads from hgwdev to hgdownload These files are pushed directly from hgwdev: /usr/local/apache/htdocs-hgdownload/goldenPath/$db/ to hgdownload: /usr/local/apache/htdocs/goldenPath/$db/. Be sure to specify this in your push request. Ask the pushers to be sure to keep the permissions group 'genecats' writable. Make sure to only push the directories that are applicable to the tracks that are in the pushQ. After they are pushed, check that everything is there.

Update or add symlink in /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes Update or add a symlink to hgwdev > /usr/local/apache/htdocs-hgdownload/goldenPath/currentGenomes so that it points to the most recent assembly. In the currentGenomes directory:

 rm Name_of_symlink
 ln -s ../$db Name_of_symlink

Request a push of the symlink from hgwdev to hgdownload. This is for ftp users who only want to go to the most recent assembly for an organism. After it is pushed, check that it is functioning correctly on the current genomes ftp page.

Ask push-request to make this assembly available on genome-mysql genome-mysql syncs with hgdownload every night, so if you have already requested the autodump, then genome-mysql will automatically sync that night anything new. If the autodump was completed at least 1 day ago, your new assembly should be available on genome-mysql, and a push request is not needed. If you do not want to wait 1 day for the nightly sync, you can request that the admins "make the $db database available on genome-mysql."

In the past, we would also request, "Links and​ ​permissions should be made for user, "genome" and "genomep".​ ​(Jorge says to follow​ ​the instructions in the wiki page for​ ​"Mirror_Server".)​"​ This information is no longer needed in the push request.

Push Static Content from hgwbeta and Round Robin

First make sure that either you or Donna have edited the pages below:

  • /usr/local/apache/htdocs/indexNews.html
  • /usr/local/apache/htdocs/goldenPath/newsarch.html
  • /usr/local/apache/htdocs/goldenPath/credits.html
  • /usr/local/apache/htdocs/FAQ/FAQreleases.html
  • /usr/local/apache/htdocs-hgdownload/downloads.html
  • /gbdb/$db/html/description.html

Then push them from hgwbeta to the RR and hgwbeta-public. For more information go to: Static_content_for_new_assemblies

Announce the Release on genome-announce@soe.ucsc.edu Whoever edited the static docs should send announcements to: genome-announce mailing list (genome-announce@soe.ucsc.edu). It is best to take the section from the news page and edit it as needed for an email.

(skip/optional) Update hgcentral.sql for mirrors Requesting this update will provide the most up-to-date version of hgcentral for mirrors. If an immediate update is important, ask the buildmeister to update the file at http://hgdownload.soe.ucsc.edu/admin/. In a recent discussion, it was decided is that it's really not necessary to do this step when we release a new assembly. We can just let the update go out with the 3-week release cycle.

Also Optional: Some mirrors like to get hgcentral tables via ftp or rsync from ftp://hgdownload.soe.ucsc.edu/mysql/hgcentral/. You can request a push from hgnfs1 --> hgdownload if you want to make the hgcentral info available there immediately (otherwise it will be copied there by a weekly rsync).

You may need to update all.joiner for the RR Relationships among tables are defined in all.joiner, which is updated on the RR when we push CGIs. If all.joiner was edited during QA of the tracks, you may want to ask for a push of all.joiner from hgwbeta (/usr/local/apache/cgi-bin/all.joiner) to the RR so that the table relationships show up in the table browser.

Chain/Nets/LiftOvers from other organisms The timing of this step is not critical. It can be done anytime after the new assembly is active on the RR. These steps can be done in either order:

  • Push the nets and chains from other orgs to the new org (more information here)
  • Add the appropriate lines to the hgcentral.liftOverChain table (using hgsql -h genome-centdb) so that hgLiftOver and hgConvert works from other organisms to the new assembly. Test hgLiftOver and hgConvert. (Do not delete any old lines from liftOverChain . . . liftOver should still work for older assemblies.)

Next day follow-up

Check that Genbank is running on the RR Make sure Genbank weekly updates are running on Round Robin. You can do this by viewing the dates on the download files in htdocs/goldenPath/$db/bigZips/ (they should be more recent than the ones you pushed with your release). Also check that the table gbLoaded is getting updated on the RR with:

 realTime.csh $db gbLoaded

The RR updates every Sunday, so check this the Monday after you release.

Check the dump of the database in the downloads Look in the database download directory and verify that the dump occurred.

Check that the downloads files generated by the genbank process are on hgdownload Make sure that the files generated by genbank that are mentioned in the bigZips/README file, such as est.fa.gz, mrna.fa.gz, etc., and the upstream* files, are actually present on hgdownload.

Check that genome-mysql is working From hgwdev:

 mysql -h genome-mysql -A -u genome $db

Retire the assembly sub-pushQ Make sure there are release log entries for the net and chain tracks in other databases.

 hgwdev> retirePushQ.csh $db

(The script will remove release log notes for all push queue entries where dbs=$db.)


Press "done!" in the main push queue This will update the release log.

Check the release log The day after you press “done!” in the main push queue for your assembly, the Release Log on the website will be updated with the information about the new release (from whatever you entered into the Release Log field of the main push queue). Verify that this happened on the release log page. To edit an entry, find it in the Log section of the push queue and edit there. Changes will appear in the release log the next day.