Automation: Difference between revisions
(Outline of documentation for genome db build automation.) |
m (added genscan to wishlist) |
||
Line 39: | Line 39: | ||
* Brian's chained protein alignments | * Brian's chained protein alignments | ||
* CpG islands | * CpG islands | ||
* genscan | |||
* multiz | * multiz | ||
* phastCons | * phastCons |
Revision as of 23:38, 21 August 2006
Why Automate?
You've seen one genome assembly, you've seen 'em all -- hardly! But there are some very predictable, repetitive things that developers need to do every time we build a genome annotation database on a new genome assembly. It is in our best interest to automate these steps when possible for these reasons:
- it saves time
- it reduces copy-paste and didn't-see-that-error-message errors
- it helps to enforce naming conventions, which helps us use each other's data
- it can produce detailed and accurate documentation of the data
- it keeps our eyes from glazing over
Of course, nothing is for free. When something goes wrong in an automated process, we must work our way back from a usually cryptic error message through an additional level of code to the source of the problem. (Or if it's GenBank automation, bug MarkD. ;) But the hope is that developers will spend their time on more tasks that require critical thinking and fewer boring repetitive tasks.
The 5/30/06 genecats meeting was devoted to discussion and planning of build automation; Hiram transcribed the whiteboard notes from the meeting in High Throughput Genome Builds.
Automation Scripting Infrastructure
use of perl... interpreted, nice support for regexes, hashes, etc.
- HgAutomate.pm
- HgRemoteScript.pm
- HgStepManager.pm
doTemplate.pl
Existing Automation Scripts
- makeGenomeDb.pl
- doRepeatMasker.pl
- makeDownloads.pl
- doSameSpeciesLiftOver.pl
- doBlastzChainNet.pl
- doHgNearBlastp.pl
- makePushQSql.pl
MarkD's genbank scripts...
Automation Wish List
- Repeat library generation (window masker?)
- Brian's chained protein alignments
- CpG islands
- genscan
- multiz
- phastCons
- meta-automation of all blastz's, multiz, phastCons?
- meta-automation of all scripts that we always run?
Automation Troubleshooting
- fileserver/machines out of sync
- cluster job dies
- cluster job hangs
- ssh hangs