QA scripts: Difference between revisions
(many more added!) |
(→Might also like!: Adding pairLastzWrapper.py to the wiki) |
||
(27 intermediate revisions by 10 users not shown) | |||
Line 1: | Line 1: | ||
This is a list of the most frequently-used programs and scripts that QAers use. It does not include track-specific programs, such as chainNetTrio.csh. Some are devised by QA and live in the source tree at kent/src/utils/qa. Some are | This is a list of the most frequently-used programs and scripts that QAers use. It does not include track-specific programs, such as chainNetTrio.csh, or [[New_track_checklist#Tables_are_sorted_and_internally_consistent | these]]. Some are devised by QA and live in the source tree at kent/src/utils/qa (these mostly end in ".csh"). Some are programs that the engineers or system administrators have written. | ||
After editing, saving, and committing and changes to scripts in utils/qa, you will need to issue the following make from the from utils dir: | |||
<pre> | |||
make SCRIPTS=/cluster/bin/scripts | |||
</pre> | |||
All scripts running from the qateam identity crontab should be in kent/src/utils/qa. Note that if you are adding a new script, you will need to add it to the makefile in the kent/src/utils/qa directory as well. | |||
=Running Scripts on Multiple Tables at Once= | |||
For the scripts that do not accept a list of tables in a file, here's a little bash script that you can use to run any of these scripts on multiple tables. Save the list of tables in a file (in your hive directory) called 'tableList_temp', and then type (note: you can press return after each line): | |||
for i in $(cat tableList_temp) | |||
do | |||
[script_name] [database/assembly] $i | |||
echo $i | |||
done | |||
Can also be run as one line for easy copy/paste. Example: | |||
for i in $(cat tableList); do runJoiner.csh danRer5 $i; echo $i; done | |||
The following scripts do not accept a file, so are good candidates for the above: | |||
*runJoiner.csh | |||
*countPerChrom.csh | |||
*runBits.csh | |||
*(more coming...) | |||
=Must know about!= | =Must know about!= | ||
==bigPush. | ==bigPush.sh== | ||
(also see: mypush) | |||
<pre> | |||
Usage: bigPush.sh [-hv] $database(s) $table(s) | |||
Required arguments: | |||
$database(s) A single database or list of databases for which | |||
table(s) will be pushed. This list can be a file. | |||
$table(s) A single table or list of tables to be pushed. | |||
List of tables can be a file. | |||
Optional arguments: | |||
-h Display this help and exit. | |||
-v VERBOSE MODE Output details of push; 1 for results to stdout, | |||
2 for results to stdout and file. | |||
Push a table or list of tables from Dev to Beta for a single database or list | |||
of databases. | |||
If you have multiple databases (and/or tables), they can be input in a few | |||
different ways. They can be either in a file or in a space-separated list | |||
enclosed in quotes. For example: | |||
bigPush.sh "ce11 hg19 gorGor3" refGene | |||
A list of tables can be pushed in a similar way. | |||
If verbose mode is set, then the output will be sent to stdout. | |||
</pre> | |||
==commTrio.csh== | ==commTrio.csh== | ||
<pre> | |||
Sorts and compares two files. | Sorts and compares two files. | ||
Counts unique and common records. | Counts unique and common records. | ||
Line 27: | Line 66: | ||
usage: leftFileName rightFileName [rm] | usage: leftFileName rightFileName [rm] | ||
optional [rm]: remove the three output files when finished | optional [rm]: remove the three output files when finished | ||
</pre> | |||
==compareWholeColumn.csh== | ==compareWholeColumn.csh== | ||
<pre> | |||
gets a column from a table on dev and beta and checks diffs. | gets a column from a table on dev and beta and checks diffs. | ||
reports numbers of rows unique to each and common. | reports numbers of rows unique to each and common. | ||
Line 36: | Line 75: | ||
usage: database table column [db2] | usage: database table column [db2] | ||
</pre> | |||
==compareWholeTable.csh== | ==compareWholeTable.csh== | ||
<pre> | |||
gets an entire table from two machines and checks diffs. | gets an entire table from two machines and checks diffs. | ||
reports numbers of rows unique to each and common. | reports numbers of rows unique to each and common. | ||
Line 46: | Line 85: | ||
usage: database table [machine1] [machine2] | usage: database table [machine1] [machine2] | ||
(defaults to dev and beta) | (defaults to dev and beta) | ||
</pre> | |||
==countPerChrom.csh== | ==countPerChrom.csh== | ||
<pre> | |||
check to see if there are annotations on all chroms. | check to see if there are annotations on all chroms. | ||
will check to see if chrom field is named tName or genoName. | will check to see if chrom field is named tName or genoName. | ||
Line 58: | Line 97: | ||
if RR is specified, will use genome-mysql | if RR is specified, will use genome-mysql | ||
histogram option prints bar graph, not values | histogram option prints bar graph, not values | ||
</pre> | |||
==featureBits== | ==featureBits== | ||
( | (also see: getYield.csh) | ||
<pre> | |||
featureBits - Correlate tables via bitmap projections. | featureBits - Correlate tables via bitmap projections. | ||
usage: | usage: | ||
featureBits database table(s) | featureBits database table(s) | ||
</pre> | |||
(truncated for brevity) | (truncated for brevity) | ||
==findLevel== | ==findLevel== | ||
<pre> | |||
searches trackDb hierarchy for your table and corresponding .html file | searches trackDb hierarchy for your table and corresponding .html file | ||
also returns the value of the priority and visibility entries | also returns the value of the priority and visibility entries | ||
Line 73: | Line 114: | ||
usage: database tableName | usage: database tableName | ||
</pre> | |||
==findOrg.csh== | ==findOrg.csh== | ||
<pre> | |||
Finds the organism name given the assembly name | Finds the organism name given the assembly name | ||
usage: assemblyName [date] | usage: assemblyName [date] | ||
Line 81: | Line 122: | ||
use 'date' to also retrieve assembly date | use 'date' to also retrieve assembly date | ||
(e.g. 'ornAna2' or 'ornAna') | (e.g. 'ornAna2' or 'ornAna') | ||
</pre> | |||
==gbdbPush== | |||
<pre> | |||
Usage: gbdbPush [-chiqvy] | |||
-c | --check : Don't do the push, just say what would be pushed. | |||
-h | --help : Print this message. | |||
-i : Interactive mode. Ask me before doing any command. | |||
-q | --quiet : Quiet mode. Don't print anything unless necessary. | |||
-v | --verbose : Increase verbosity. | |||
-y | --noprompt : Just do it, don't ask me. | |||
</pre> | |||
==getAssemblies.csh== | ==getAssemblies.csh== | ||
<pre> | |||
gets the names of all databases that contain a given table. | gets the names of all databases that contain a given table. | ||
will accept the MySQL wildcard, %, but not on RR machines | will accept the MySQL wildcard, %, but not on RR machines | ||
Line 90: | Line 142: | ||
usage: tablename [machine] [verbose] - defaults to beta | usage: tablename [machine] [verbose] - defaults to beta | ||
"verbose" prints list of assemblies checked | "verbose" prints list of assemblies checked | ||
</pre> | |||
==getTrackName.csh== | ==getTrackName.csh== | ||
<pre> | |||
Returns the short label and group of the track for this table. | Returns the short label and group of the track for this table. | ||
In the case of a composite track, it returns the short label | In the case of a composite track, it returns the short label | ||
Line 98: | Line 150: | ||
usage: database tableName | usage: database tableName | ||
</pre> | |||
==htdocsPush== | |||
<pre> | |||
Usage: htdocsPush [-chiqvy] | |||
-c | --check : Don't do the push, just say what would be pushed. | |||
-h | --help : Print this message. | |||
-i : Interactive mode. Ask me before doing any command. | |||
-q | --quiet : Quiet mode. Don't print anything unless necessary. | |||
-v | --verbose : Increase verbosity. | |||
-y | --noprompt : Just do it, don't ask me. | |||
</pre> | |||
==joinerCheck== | ==joinerCheck== | ||
(also see: runJoiner.csh) | (also see: runJoiner.csh) | ||
<pre> | |||
joinerCheck - Parse and check joiner file | joinerCheck - Parse and check joiner file | ||
usage: | usage: | ||
Line 120: | Line 183: | ||
-verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to | -verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to | ||
- show increasing level of detail for some functions. | - show increasing level of detail for some functions. | ||
</pre> | |||
==mypush== | |||
used with sudo, generally like so, from hgwdev: | |||
* sudo mypush $db $table hgwbeta | |||
<pre> | |||
Usage: mypush database table-pattern [hostlist] | |||
NOTE: use single quotes around table-pattern | |||
if it contains shell special chars like * or ? | |||
</pre> | |||
==realTime.csh== | ==realTime.csh== | ||
(also see: updateTimes.csh) | (also see: updateTimes.csh) | ||
<pre> | |||
gets update times from all machines in real time for tables in list. | gets update times from all machines in real time for tables in list. | ||
usage: database tablelist (will accept single table) | usage: database tablelist (will accept single table) | ||
</pre> | |||
==runBits.csh== | ==runBits.csh== | ||
<pre> | |||
runs featureBits and checks for overlap with gaps. | runs featureBits and checks for overlap with gaps. | ||
usage: database trackname [checkUnbridged] | usage: database trackname [checkUnbridged] | ||
where overlap with unbridged gaps can be turned on | where overlap with unbridged gaps can be turned on | ||
</pre> | |||
==updateTimes.csh== | ==updateTimes.csh== | ||
(also see: realTime.csh) | (also see: realTime.csh) | ||
<pre> | |||
gets update times for three machines for tables in list. | gets update times for three machines for tables in list. | ||
if table is trackDb, trackDb_public will also be checked. | if table is trackDb, trackDb_public will also be checked. | ||
Line 146: | Line 221: | ||
reports on dev, beta and RR | reports on dev, beta and RR | ||
tablelist will accept single table | tablelist will accept single table | ||
</pre> | |||
=Might also like!= | |||
==checkBOT.csh== | |||
This is useful for answering mailing list questions. Also see the "bottleneck" command. | |||
<pre> | |||
wrapper around bottleneck check. | |||
gives delay stats for IP address(es). | |||
= | usage: ipAddress [terse] | ||
(terse gives only data) | |||
(use ipAddress = "all" to get all IPs having delays) | |||
</pre> | |||
==compareTrackDbAll.csh== | ==compareTrackDbAll.csh== | ||
<pre> | |||
checks all fields in trackDb | checks all fields in trackDb | ||
Line 159: | Line 245: | ||
- verbose is for html field - defaults to terse | - verbose is for html field - defaults to terse | ||
- fast = (genome-mysql) - defaults to realTime (WGET) | - fast = (genome-mysql) - defaults to realTime (WGET) | ||
</pre> | |||
==checkPushedFiles.csh== | ==checkPushedFiles.csh== | ||
<pre> | |||
checks to see if files are in place, after a push | checks to see if files are in place, after a push | ||
Line 168: | Line 254: | ||
website should include the path of the directory where | website should include the path of the directory where | ||
the files reside, such as: | the files reside, such as: | ||
http://hgdownload. | http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/ | ||
file(s) is either a single name or a list of names, and can | file(s) is either a single name or a list of names, and can | ||
Line 177: | Line 263: | ||
any output other than '200 OK' indicates an error. | any output other than '200 OK' indicates an error. | ||
</pre> | |||
==compareTableToFile.csh== | ==compareTableToFile.csh== | ||
<pre> | |||
Ensures that a table correlates with its associated file. | Ensures that a table correlates with its associated file. | ||
Only prints results if there is a diff between table and file. | Only prints results if there is a diff between table and file. | ||
Line 191: | Line 277: | ||
use verbose for more details | use verbose for more details | ||
</pre> | |||
==copyExtSeqRows.csh== | ==copyExtSeqRows.csh== | ||
<pre> | |||
Automatically copies appropriate rows from the extFile and seq tables | Automatically copies appropriate rows from the extFile and seq tables | ||
from hgwdev to hgwbeta. | from hgwdev to hgwbeta. | ||
</pre> | |||
(truncated for brevity) | (truncated for brevity) | ||
==countRows.csh== | ==countRows.csh== | ||
<pre> | |||
gets the rowcount for a list of tables from dev, beta and RR. | gets the rowcount for a list of tables from dev, beta and RR. | ||
Line 207: | Line 294: | ||
RR results not in real time, but from dumps | RR results not in real time, but from dumps | ||
genome-mysql option adds results from public mysql server | genome-mysql option adds results from public mysql server | ||
</pre> | |||
==encodeEmail.pl== | |||
<pre> | |||
rewrites an email address using character codes so it's safer to put on a web page | |||
usage: /cluster/bin/scripts/encodeEmail.pl <email addresses> | |||
The email addresses will be encoded and sent to stdout | |||
</pre> | |||
==findBlatServer.csh== | ==findBlatServer.csh== | ||
<pre> | |||
gets info about which blat server hosts which genome(s) | gets info about which blat server hosts which genome(s) | ||
Line 218: | Line 312: | ||
third parameter optional: specify machine | third parameter optional: specify machine | ||
defaults to RR | defaults to RR | ||
</pre> | |||
==findColumn.csh== | ==findColumn.csh== | ||
<pre> | |||
searches database for all tables containing a specified column name. | searches database for all tables containing a specified column name. | ||
usage: database, field, [machine = hgwdev|hgwbeta] | usage: database, field, [machine = hgwdev|hgwbeta] | ||
(defaults to beta) | (defaults to beta) | ||
</pre> | |||
==findPushQLocks.csh== | ==findPushQLocks.csh== | ||
<pre> | |||
find all locks in the pushQ on hgwbeta | find all locks in the pushQ on hgwbeta | ||
Line 234: | Line 327: | ||
run with 'go' to see a list of locks | run with 'go' to see a list of locks | ||
run with 'real' to unlock all the locks | run with 'real' to unlock all the locks | ||
</pre> | |||
==getChainLines.csh== | ==getChainLines.csh== | ||
(also see: getMatrixLines.csh) | |||
<pre> | |||
Searches the README.txt files to find the correct parameters for the | Searches the README.txt files to find the correct parameters for the | ||
$chainMinScore and $chainLinearGap variables. | $chainMinScore and $chainLinearGap variables. | ||
usage: fromDb toDb (these can be in either order) | usage: fromDb toDb (these can be in either order) | ||
</pre> | |||
==getChromlist.csh== | ==getChromlist.csh== | ||
<pre> | |||
prints the chrom names for an assembly. | prints the chrom names for an assembly. | ||
usage: database [norandom] | usage: database [norandom] | ||
</pre> | |||
==getMatrixLines.csh== | ==getMatrixLines.csh== | ||
(aslo see: getChainLines.csh) | |||
<pre> | |||
Searches the README.txt files to find the correct parameters for the | Searches the README.txt files to find the correct parameters for the | ||
$matrix variable. This is the q-parameter from the blastz run. | $matrix variable. This is the q-parameter from the blastz run. | ||
usage: fromDb toDb (these can be in either order) | usage: fromDb toDb (these can be in either order) | ||
</pre> | |||
==getYield.csh== | ==getYield.csh== | ||
(also see: featureBits) | (also see: featureBits) | ||
<pre> | |||
uses featureBits to get yield and enrichment. | uses featureBits to get yield and enrichment. | ||
usage: database trackname [reference track] | usage: database trackname [reference track] | ||
refTrack defaults to refGene | refTrack defaults to refGene | ||
</pre> | |||
==pairLastzWrapper.py== | |||
<pre> | |||
A program that determines the target, query, clades, and full GC assembly hub name to run the pairLastz script. | |||
usage: usage: python3 pairLastzWrapper.py [-h] -a1 ASSEMBLY_ONE -a2 ASSEMBLY_TWO | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
-a1 ASSEMBLY_ONE, --assembly_one ASSEMBLY_ONE | |||
Specify assembly one. Ex. hg38 or GCF_001704415.1 | |||
-a2 ASSEMBLY_TWO, --assembly_two ASSEMBLY_TWO | |||
Specify assembly two Ex. hg38 or GCF_001704415.1 | |||
</pre> | |||
==runJoiner.csh== | ==runJoiner.csh== | ||
<pre> | |||
runs joinerCheck -keys, finding all identifiers for a table. | runs joinerCheck -keys, finding all identifiers for a table. | ||
runs joinerCheck -times (use "noTimes" to disable). | runs joinerCheck -times (use "noTimes" to disable). | ||
Line 271: | Line 380: | ||
usage: database table [all.joiner file to use] [noTimes] | usage: database table [all.joiner file to use] [noTimes] | ||
</pre> | |||
==tdbQuery== | |||
<pre> | |||
tdbQuery - Query the trackDb system using SQL syntax. | |||
Usage: | |||
tdbQuery sqlStatement | |||
Where the SQL statement is enclosed in quotations to avoid the shell interpreting it. | |||
Only a very restricted subset of a single SQL statement (select) is supported. Examples: | |||
tdbQuery "select count(*) from hg18" | |||
counts all of the tracks in hg18 and prints the results to stdout | |||
tdbQuery "select count(*) from *" | |||
counts all tracks in all databases. | |||
tdbQuery "select track,shortLabel from hg18 where type like 'bigWig%'" | |||
prints to stdout a a two field .ra file containing just the track and shortLabels of bigWig | |||
type tracks in the hg18 version of trackDb. | |||
tdbQuery "select * from hg18 where track='knownGene' or track='ensGene'" | |||
prints the hg18 knownGene and ensGene track's information to stdout. | |||
tdbQuery "select *Label from mm9" | |||
prints all fields that end in 'Label' from the mm9 trackDb. | |||
OPTIONS: ... | |||
</pre> | |||
(truncated for brevity) | |||
=Scripts for file-based (big*, bam, etc) tracks= | |||
==bigBedInfo== | |||
<pre> | |||
bigBedInfo - Show information about a bigBed file. | |||
usage: | |||
bigBedInfo file.bb | |||
options: | |||
-udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs | |||
-chroms - list all chromosomes and their sizes | |||
-zooms - list all zoom levels and theier sizes | |||
-as - get autoSql spec | |||
</pre> | |||
==bigBedSummary== | |||
<pre> | |||
bigBedSummary - Extract summary information from a bigBed file. | |||
usage: | |||
bigBedSummary file.bb chrom start end dataPoints | |||
Get summary data from bigBed for indicated region, broken into | |||
dataPoints equal parts. (Use dataPoints=1 for simple summary.) | |||
options: | |||
-type=X where X is one of: | |||
coverage - % of region that is covered (default) | |||
mean - average depth of covered regions | |||
min - minimum depth of covered regions | |||
max - maximum depth of covered regions | |||
-fields - print out information on fields in file. | |||
If fields option is used, the chrom, start, end, dataPoints | |||
parameters may be omitted | |||
-udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs | |||
</pre> | |||
==bigWigInfo== | |||
<pre> | |||
bigWigInfo - Print out information about bigWig file. | |||
usage: | |||
bigWigInfo file.bw | |||
options: | |||
-udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs | |||
-chroms - list all chromosomes and their sizes | |||
-zooms - list all zoom levels and their sizes | |||
-minMax - list the min and max on a single line | |||
</pre> | |||
==bigWigSummary== | |||
<pre> | |||
bigWigSummary - Extract summary information from a bigWig file. | |||
usage: | |||
bigWigSummary file.bigWig chrom start end dataPoints | |||
Get summary data from bigWig for indicated region, broken into | |||
dataPoints equal parts. (Use dataPoints=1 for simple summary.) | |||
NOTE: start and end coordinates are in BED format (0-based) | |||
options: | |||
-type=X where X is one of: | |||
mean - average value in region (default) | |||
min - minimum value in region | |||
max - maximum value in region | |||
std - standard deviation in region | |||
coverage - % of region that is covered | |||
-udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs | |||
</pre> | |||
==hgWiggle== | |||
<pre> | |||
hgWiggle - fetch wiggle data from data base or file | |||
usage: | |||
hgWiggle [options] <track names ...> | |||
options: | |||
-db=<database> - use specified database | |||
-chr=chrN - examine data only on chrN | |||
-chrom=chrN - same as -chr option above | |||
-position=[chrN:]start-end - examine data in window start-end (1-relative) | |||
(the chrN: is optional) | |||
-chromLst=<file> - file with list of chroms to examine | |||
-doAscii - perform the default ascii output, in addition to other outputs | |||
- Any of the other -do outputs turn off the default ascii output | |||
-rawDataOut - output just the data values, nothing else | |||
-htmlOut - output stats or histogram in HTML instead of plain text | |||
-doStats - perform stats measurement, default output text, see -htmlOut | |||
-doBed - output bed format | |||
-lift=<D> - lift ascii output positions by D (0 default) | |||
-bedFile=<file> - constrain output to ranges specified in bed <file> | |||
-dataConstraint='DC' - where DC is one of < = >= <= == != 'in range' | |||
-ll=<F> - lowerLimit compare data values to F (float) (all but 'in range') | |||
-ul=<F> - upperLimit compare data values to F (float) | |||
(need both ll and ul when 'in range') | |||
-help - display more examples and extra options (to stderr) | |||
When no database is specified, track names will refer to .wig files | |||
example using the file chrM.wig: | |||
hgWiggle chrM | |||
example using the database table hg17.gc5Base: | |||
hgWiggle -chr=chrM -db=hg17 gc5Base | |||
</pre> | |||
==samtools view== | |||
For use with bam and sam. If it isn't there already, add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc / .tcshrc. For more info, see the samtools [http://samtools.sourceforge.net/samtools.shtml man page]. For additional info on interpreting the output, see the [http://samtools.sourceforge.net/SAM1.pdf SAM/BAM format spec]. | |||
<pre> | |||
samtools view | |||
Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]] | |||
Options: -b output BAM | |||
-h print header for the SAM output | |||
-H print header only (no alignments) | |||
-S input is SAM | |||
-u uncompressed BAM output (force -b) | |||
-1 fast compression (force -b) | |||
-x output FLAG in HEX (samtools-C specific) | |||
-X output FLAG in string (samtools-C specific) | |||
-c print only the count of matching records | |||
-L FILE output alignments overlapping the input BED FILE [null] | |||
-t FILE list of reference names and lengths (force -S) [null] | |||
-T FILE reference sequence file (force -S) [null] | |||
-o FILE output file name [stdout] | |||
-R FILE list of read groups to be outputted [null] | |||
-f INT required flag, 0 for unset [0] | |||
-F INT filtering flag, 0 for unset [0] | |||
-q INT minimum mapping quality [0] | |||
-l STR only output reads in library STR [null] | |||
-r STR only output reads in read group STR [null] | |||
-s FLOAT fraction of templates to subsample; integer part as seed [-1] | |||
-? longer help | |||
</pre> | |||
=ENCODE QA scripts= | |||
These are scripts that the ENCODE QA team frequently uses. | |||
==encodeEmail.pl== | |||
<pre> | |||
usage: /cluster/bin/scripts/encodeEmail.pl <email addresses> | |||
The email addresses will be encoded and sent to stdout | |||
</pre> | |||
==encodeQaCheckHgdownloadFiles== | |||
<pre> | |||
usage: encodeQaCheckRRFiles [-h] [-d] [-s SERVER] database composite files | |||
Compares files on the RR against a list of files | |||
positional arguments: | |||
database The database, typically hg19 or mm9 | |||
composite The composite name, wgEncodeCshlLongRnaSeq for | |||
instance | |||
files The list of files | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
-d, --dev Check for files missing on dev that are present on the | |||
RR | |||
-s SERVER, --server SERVER | |||
The server to use, like hgdownload. | |||
Example: | |||
encodeQaCheckHgdownloadFiles hg19 wgEncodeSydhTfbs files.list | |||
encodeQaCheckHgdownloadFiles hg18 wgEncodeHudsonalphaChipSeq checkPushFilesList | |||
</pre> | |||
==encodeQaInit== | |||
For more specific info about the other scripts this script runs and the files it creates see the [[ENCODE_QA#Run_encodeQaInit | ENCODE QA]] wiki. | |||
<pre> | |||
usage: encodeQaInit [-h] [-t] [-m MDB] database composite release redmine | |||
Initializes QA directory for claiming a release | |||
positional arguments: | |||
database The database, typically hg19 or mm9 | |||
composite The composite name, wgEncodeCshlLongRnaSeq for instance | |||
release The new release to be released | |||
redmine The Redmine issue number | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
-t, --test Test mode:doesn't change status to reviewing, outputs to | |||
test qa Directory | |||
-m MDB, --mdb MDB use a different mdb composite name | |||
Example: | |||
encodeQaInit hg19 wgEncodeSydhTfbs 1 69 | |||
encodeQaInit hg18 wgEncodeHudsonalphaChipSeq 3 504 | |||
</pre> | |||
==encodeQaPrepareRelease== | |||
<pre> | |||
usage: encodeQaPrepareRelease [-h] database composite stage | |||
Stages a track either to beta or to public | |||
positional arguments: | |||
database The database you're using | |||
composite The composite you're using | |||
stage The stage you are staging to | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
Examples: | |||
encodeQaPrepareRelease hg19 wgEncodeSydhTfbs beta | |||
encodeQaPrepareRelease hg19 wgEncodeHaibTfbs public | |||
</pre> | |||
==encodeQaSqlRelease== | |||
Creates a pushQ entry directly in the L queue of the Main pushQ so the ENCODE track will have an entry in the release log. | |||
<pre> | |||
usage: encodeQaSqlRelease <release.sql> <sponsor> | |||
example: encodeQaSqlRelease release.sql wong | |||
</pre> | |||
==encodeStatus.pl== | |||
Sets the ENCODE status of subIds inputted (QA usually only uses after releasing a track, setting the subIds' ENCODE status to 'released'). | |||
<pre> | |||
usage: encodeStatus [-instance=instanceName] [-force] project-id|project-name [status] | |||
valid statuses: loaded, displayed, approved, reviewing, released | |||
-instance Default instance is 'prod' | |||
-force Use if you want to set a status that is not normally allowed (e.g. to reset | |||
to an earlier status). | |||
</pre> | |||
==getTrackReferences== | |||
From list of pubMed Ids, provides html output of references in [[CBSE_citation_format | CBSE citation format]]. | |||
<pre> | |||
usage getTrackReferences <pubmed_id1> <pubmed_id2> ... <pubmed_idn> | |||
</pre> | |||
==mdbPrint== | |||
Useful for checking the experiments (expIds) have been done correctly, see [[ENCODE_QA#MetaData | ENCODE QA]] wiki. | |||
<pre> | |||
mdbPrint - Prints metadata objects, variables and values from 'metaDb' table. | |||
usage: | |||
mdbPrint {db} [-table=] [-byVar] [-line/-count] | |||
[-all] | |||
[-vars="var1=val1 var2=val2..."] | |||
[-obj= [-var= [-val=]]] | |||
[-var= [-val=]] | |||
[-specialHelp] | |||
Options: ... | |||
</pre> | |||
(truncated for brevity) | |||
==qaEncodeTracks2== | |||
Runs test suite for ENCODE tracks (the library functions used by script are also run by encodeQaInit, which puts the output in the script.out file, but can still be run on its own). | |||
<pre> | |||
usage: qaEncodeTracks2 [-h] database tableList [trackDb] | |||
A series of checks for QA | |||
positional arguments: | |||
database The database, typically hg19 or mm9 | |||
tableList The file containing a list of tables | |||
trackDb The trackDb file to check | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
Examples: | |||
qaEncodeTracks2 hg19 tableList | |||
qaEncodeTracks2 hg19 tableList /path/to/trackDb.ra | |||
qaEncodeTracks2 hg19 tableList ~/kent/src/hg/makeDb/trackDb/human/hg19/wgEncodeSydhTfbs.new.ra | |||
</pre> | |||
==raDiff== | |||
Used on metaDb .ra files and cv.ra files; may be expanded to trackDb .ra files. | |||
<pre> | |||
usage: qaRaDiff [-h] RaFileOne RaFileTwo | |||
Describes the differences between the two .ra files | |||
positional arguments: | |||
RaFileOne The .ra file | |||
RaFileTwo The .ra file to compare to | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
example: qaRaDiff alpha/wgEncodeUwTfbs.ra beta/wgEncodeUwTfbs.ra | |||
</pre> | |||
==raMerge== | |||
Used on metaDb .ra files and cv.ra files; may be expanded to trackDb .ra files. | |||
<pre> | |||
usage: raMerge [-h] [-t] RaFileOne RaFileTwo | |||
Merges two .ra files in a way that you would expect | |||
positional arguments: | |||
RaFileOne The .ra file | |||
RaFileTwo The .ra file to merge with | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
-t, --trackDb Print as trackDb | |||
example: raMerge alpha/wgEncodeUwTfbs.ra beta/wgEncodeUwTfbs.ra | |||
</pre> | |||
[[Category:Browser QA]] | [[Category:Browser QA]] | ||
[[Category:Browser QA Training]] | [[Category:Browser QA Training]] | ||
[[Category:Browser QA ENCODE]] |
Latest revision as of 20:51, 23 March 2022
This is a list of the most frequently-used programs and scripts that QAers use. It does not include track-specific programs, such as chainNetTrio.csh, or these. Some are devised by QA and live in the source tree at kent/src/utils/qa (these mostly end in ".csh"). Some are programs that the engineers or system administrators have written.
After editing, saving, and committing and changes to scripts in utils/qa, you will need to issue the following make from the from utils dir:
make SCRIPTS=/cluster/bin/scripts
All scripts running from the qateam identity crontab should be in kent/src/utils/qa. Note that if you are adding a new script, you will need to add it to the makefile in the kent/src/utils/qa directory as well.
Running Scripts on Multiple Tables at Once
For the scripts that do not accept a list of tables in a file, here's a little bash script that you can use to run any of these scripts on multiple tables. Save the list of tables in a file (in your hive directory) called 'tableList_temp', and then type (note: you can press return after each line):
for i in $(cat tableList_temp) do [script_name] [database/assembly] $i echo $i done
Can also be run as one line for easy copy/paste. Example:
for i in $(cat tableList); do runJoiner.csh danRer5 $i; echo $i; done
The following scripts do not accept a file, so are good candidates for the above:
- runJoiner.csh
- countPerChrom.csh
- runBits.csh
- (more coming...)
Must know about!
bigPush.sh
(also see: mypush)
Usage: bigPush.sh [-hv] $database(s) $table(s) Required arguments: $database(s) A single database or list of databases for which table(s) will be pushed. This list can be a file. $table(s) A single table or list of tables to be pushed. List of tables can be a file. Optional arguments: -h Display this help and exit. -v VERBOSE MODE Output details of push; 1 for results to stdout, 2 for results to stdout and file. Push a table or list of tables from Dev to Beta for a single database or list of databases. If you have multiple databases (and/or tables), they can be input in a few different ways. They can be either in a file or in a space-separated list enclosed in quotes. For example: bigPush.sh "ce11 hg19 gorGor3" refGene A list of tables can be pushed in a similar way. If verbose mode is set, then the output will be sent to stdout.
commTrio.csh
Sorts and compares two files. Counts unique and common records. usage: leftFileName rightFileName [rm] optional [rm]: remove the three output files when finished
compareWholeColumn.csh
gets a column from a table on dev and beta and checks diffs. reports numbers of rows unique to each and common. can compare to older database. writes files of everything. usage: database table column [db2]
compareWholeTable.csh
gets an entire table from two machines and checks diffs. reports numbers of rows unique to each and common. writes files of everything. not real-time on RR -- uses genome-mysql. usage: database table [machine1] [machine2] (defaults to dev and beta)
countPerChrom.csh
check to see if there are annotations on all chroms. will check to see if chrom field is named tName or genoName. usage: database1 table [database2] [RR] [histogram] checks database1 on dev database2 will be checked on beta by default if RR is specified, will use genome-mysql histogram option prints bar graph, not values
featureBits
(also see: getYield.csh)
featureBits - Correlate tables via bitmap projections. usage: featureBits database table(s)
(truncated for brevity)
findLevel
searches trackDb hierarchy for your table and corresponding .html file also returns the value of the priority and visibility entries and the .ra file location for each usage: database tableName
findOrg.csh
Finds the organism name given the assembly name usage: assemblyName [date] will accept name with or without digit use 'date' to also retrieve assembly date (e.g. 'ornAna2' or 'ornAna')
gbdbPush
Usage: gbdbPush [-chiqvy] -c | --check : Don't do the push, just say what would be pushed. -h | --help : Print this message. -i : Interactive mode. Ask me before doing any command. -q | --quiet : Quiet mode. Don't print anything unless necessary. -v | --verbose : Increase verbosity. -y | --noprompt : Just do it, don't ask me.
getAssemblies.csh
gets the names of all databases that contain a given table. will accept the MySQL wildcard, %, but not on RR machines note: not real-time on RR. uses nightly TABLE STATUS dump. usage: tablename [machine] [verbose] - defaults to beta "verbose" prints list of assemblies checked
getTrackName.csh
Returns the short label and group of the track for this table. In the case of a composite track, it returns the short label for both the sub track and the parent track. usage: database tableName
htdocsPush
Usage: htdocsPush [-chiqvy] -c | --check : Don't do the push, just say what would be pushed. -h | --help : Print this message. -i : Interactive mode. Ask me before doing any command. -q | --quiet : Quiet mode. Don't print anything unless necessary. -v | --verbose : Increase verbosity. -y | --noprompt : Just do it, don't ask me.
joinerCheck
(also see: runJoiner.csh)
joinerCheck - Parse and check joiner file usage: joinerCheck file.joiner options: -fields - Check fields in joiner file exist, faster with -fieldListIn -fieldListOut=file - List all fields in all databases to file. -fieldListIn=file - Get list of fields from file rather than mysql. -keys - Validate (foreign) keys. Takes about an hour. -tableCoverage - Check that all tables are mentioned in joiner file -dbCoverage - Check that all databases are mentioned in joiner file -times - Check update times of tables are after tables they depend on -all - Do all tests: -fields -keys -tableCoverage -dbCoverage -times -identifier=name - Just validate given identifier. Note only applies to keys and fields checks. -database=name - Just validate given database. Note only applies to keys and times checks. -verbose=N - use verbose to diagnose difficulties. N = 2, 3 or 4 to - show increasing level of detail for some functions.
mypush
used with sudo, generally like so, from hgwdev:
- sudo mypush $db $table hgwbeta
Usage: mypush database table-pattern [hostlist] NOTE: use single quotes around table-pattern if it contains shell special chars like * or ?
realTime.csh
(also see: updateTimes.csh)
gets update times from all machines in real time for tables in list. usage: database tablelist (will accept single table)
runBits.csh
runs featureBits and checks for overlap with gaps. usage: database trackname [checkUnbridged] where overlap with unbridged gaps can be turned on
updateTimes.csh
(also see: realTime.csh)
gets update times for three machines for tables in list. if table is trackDb, trackDb_public will also be checked. warning: not in real time for RR. uses overnight dump. usage: database tablelist reports on dev, beta and RR tablelist will accept single table
Might also like!
checkBOT.csh
This is useful for answering mailing list questions. Also see the "bottleneck" command.
wrapper around bottleneck check. gives delay stats for IP address(es). usage: ipAddress [terse] (terse gives only data) (use ipAddress = "all" to get all IPs having delays)
compareTrackDbAll.csh
checks all fields in trackDb usage: database [machine1] [machine2] [mode] (defaults to hgw1 and hgwbeta) mode = (fast | verbose | fastVerbose) - verbose is for html field - defaults to terse - fast = (genome-mysql) - defaults to realTime (WGET)
checkPushedFiles.csh
checks to see if files are in place, after a push usage: website files(s) website should include the path of the directory where the files reside, such as: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/ file(s) is either a single name or a list of names, and can include items with additional directory structure, like so: filename dir/filename dir/dir/dir/filename any output other than '200 OK' indicates an error.
compareTableToFile.csh
Ensures that a table correlates with its associated file. Only prints results if there is a diff between table and file. Works for these file types: narrowPeak, broadPeak, gappedPeak, bedGraph, NRE, BiP, gcf For wiggle files, you must specify [wig] parameter. usage: database tableName fileName [wig] [verbose] fileName includes path of download file e.g. /goldenPath/<db>/fileName.gz use verbose for more details
copyExtSeqRows.csh
Automatically copies appropriate rows from the extFile and seq tables from hgwdev to hgwbeta.
(truncated for brevity)
countRows.csh
gets the rowcount for a list of tables from dev, beta and RR. usage: database tablelist [genome-mysql] tablelist can be just name of single table RR results not in real time, but from dumps genome-mysql option adds results from public mysql server
encodeEmail.pl
rewrites an email address using character codes so it's safer to put on a web page usage: /cluster/bin/scripts/encodeEmail.pl <email addresses> The email addresses will be encoded and sent to stdout
findBlatServer.csh
gets info about which blat server hosts which genome(s) usage: db|host|all [db|host] [machine] first parameter required: one specific db or host or all dbs second parameter optional: order by db or by host (blatServer) defaults to order by db third parameter optional: specify machine defaults to RR
findColumn.csh
searches database for all tables containing a specified column name. usage: database, field, [machine = hgwdev|hgwbeta] (defaults to beta)
findPushQLocks.csh
find all locks in the pushQ on hgwbeta usage: go|real run with 'go' to see a list of locks run with 'real' to unlock all the locks
getChainLines.csh
(also see: getMatrixLines.csh)
Searches the README.txt files to find the correct parameters for the $chainMinScore and $chainLinearGap variables. usage: fromDb toDb (these can be in either order)
getChromlist.csh
prints the chrom names for an assembly. usage: database [norandom]
getMatrixLines.csh
(aslo see: getChainLines.csh)
Searches the README.txt files to find the correct parameters for the $matrix variable. This is the q-parameter from the blastz run. usage: fromDb toDb (these can be in either order)
getYield.csh
(also see: featureBits)
uses featureBits to get yield and enrichment. usage: database trackname [reference track] refTrack defaults to refGene
pairLastzWrapper.py
A program that determines the target, query, clades, and full GC assembly hub name to run the pairLastz script. usage: usage: python3 pairLastzWrapper.py [-h] -a1 ASSEMBLY_ONE -a2 ASSEMBLY_TWO optional arguments: -h, --help show this help message and exit -a1 ASSEMBLY_ONE, --assembly_one ASSEMBLY_ONE Specify assembly one. Ex. hg38 or GCF_001704415.1 -a2 ASSEMBLY_TWO, --assembly_two ASSEMBLY_TWO Specify assembly two Ex. hg38 or GCF_001704415.1
runJoiner.csh
runs joinerCheck -keys, finding all identifiers for a table. runs joinerCheck -times (use "noTimes" to disable). set database to "all" for global. for chains/nets, use tablename format: chainDb. usage: database table [all.joiner file to use] [noTimes]
tdbQuery
tdbQuery - Query the trackDb system using SQL syntax. Usage: tdbQuery sqlStatement Where the SQL statement is enclosed in quotations to avoid the shell interpreting it. Only a very restricted subset of a single SQL statement (select) is supported. Examples: tdbQuery "select count(*) from hg18" counts all of the tracks in hg18 and prints the results to stdout tdbQuery "select count(*) from *" counts all tracks in all databases. tdbQuery "select track,shortLabel from hg18 where type like 'bigWig%'" prints to stdout a a two field .ra file containing just the track and shortLabels of bigWig type tracks in the hg18 version of trackDb. tdbQuery "select * from hg18 where track='knownGene' or track='ensGene'" prints the hg18 knownGene and ensGene track's information to stdout. tdbQuery "select *Label from mm9" prints all fields that end in 'Label' from the mm9 trackDb. OPTIONS: ...
(truncated for brevity)
Scripts for file-based (big*, bam, etc) tracks
bigBedInfo
bigBedInfo - Show information about a bigBed file. usage: bigBedInfo file.bb options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -chroms - list all chromosomes and their sizes -zooms - list all zoom levels and theier sizes -as - get autoSql spec
bigBedSummary
bigBedSummary - Extract summary information from a bigBed file. usage: bigBedSummary file.bb chrom start end dataPoints Get summary data from bigBed for indicated region, broken into dataPoints equal parts. (Use dataPoints=1 for simple summary.) options: -type=X where X is one of: coverage - % of region that is covered (default) mean - average depth of covered regions min - minimum depth of covered regions max - maximum depth of covered regions -fields - print out information on fields in file. If fields option is used, the chrom, start, end, dataPoints parameters may be omitted -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs
bigWigInfo
bigWigInfo - Print out information about bigWig file. usage: bigWigInfo file.bw options: -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs -chroms - list all chromosomes and their sizes -zooms - list all zoom levels and their sizes -minMax - list the min and max on a single line
bigWigSummary
bigWigSummary - Extract summary information from a bigWig file. usage: bigWigSummary file.bigWig chrom start end dataPoints Get summary data from bigWig for indicated region, broken into dataPoints equal parts. (Use dataPoints=1 for simple summary.) NOTE: start and end coordinates are in BED format (0-based) options: -type=X where X is one of: mean - average value in region (default) min - minimum value in region max - maximum value in region std - standard deviation in region coverage - % of region that is covered -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs
hgWiggle
hgWiggle - fetch wiggle data from data base or file usage: hgWiggle [options] <track names ...> options: -db=<database> - use specified database -chr=chrN - examine data only on chrN -chrom=chrN - same as -chr option above -position=[chrN:]start-end - examine data in window start-end (1-relative) (the chrN: is optional) -chromLst=<file> - file with list of chroms to examine -doAscii - perform the default ascii output, in addition to other outputs - Any of the other -do outputs turn off the default ascii output -rawDataOut - output just the data values, nothing else -htmlOut - output stats or histogram in HTML instead of plain text -doStats - perform stats measurement, default output text, see -htmlOut -doBed - output bed format -lift=<D> - lift ascii output positions by D (0 default) -bedFile=<file> - constrain output to ranges specified in bed <file> -dataConstraint='DC' - where DC is one of < = >= <= == != 'in range' -ll=<F> - lowerLimit compare data values to F (float) (all but 'in range') -ul=<F> - upperLimit compare data values to F (float) (need both ll and ul when 'in range') -help - display more examples and extra options (to stderr) When no database is specified, track names will refer to .wig files example using the file chrM.wig: hgWiggle chrM example using the database table hg17.gc5Base: hgWiggle -chr=chrM -db=hg17 gc5Base
samtools view
For use with bam and sam. If it isn't there already, add /hive/data/outside/samtools/svn_${MACHTYPE}/samtools to $PATH in your .bashrc / .tcshrc. For more info, see the samtools man page. For additional info on interpreting the output, see the SAM/BAM format spec.
samtools view Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]] Options: -b output BAM -h print header for the SAM output -H print header only (no alignments) -S input is SAM -u uncompressed BAM output (force -b) -1 fast compression (force -b) -x output FLAG in HEX (samtools-C specific) -X output FLAG in string (samtools-C specific) -c print only the count of matching records -L FILE output alignments overlapping the input BED FILE [null] -t FILE list of reference names and lengths (force -S) [null] -T FILE reference sequence file (force -S) [null] -o FILE output file name [stdout] -R FILE list of read groups to be outputted [null] -f INT required flag, 0 for unset [0] -F INT filtering flag, 0 for unset [0] -q INT minimum mapping quality [0] -l STR only output reads in library STR [null] -r STR only output reads in read group STR [null] -s FLOAT fraction of templates to subsample; integer part as seed [-1] -? longer help
ENCODE QA scripts
These are scripts that the ENCODE QA team frequently uses.
encodeEmail.pl
usage: /cluster/bin/scripts/encodeEmail.pl <email addresses> The email addresses will be encoded and sent to stdout
encodeQaCheckHgdownloadFiles
usage: encodeQaCheckRRFiles [-h] [-d] [-s SERVER] database composite files Compares files on the RR against a list of files positional arguments: database The database, typically hg19 or mm9 composite The composite name, wgEncodeCshlLongRnaSeq for instance files The list of files optional arguments: -h, --help show this help message and exit -d, --dev Check for files missing on dev that are present on the RR -s SERVER, --server SERVER The server to use, like hgdownload. Example: encodeQaCheckHgdownloadFiles hg19 wgEncodeSydhTfbs files.list encodeQaCheckHgdownloadFiles hg18 wgEncodeHudsonalphaChipSeq checkPushFilesList
encodeQaInit
For more specific info about the other scripts this script runs and the files it creates see the ENCODE QA wiki.
usage: encodeQaInit [-h] [-t] [-m MDB] database composite release redmine Initializes QA directory for claiming a release positional arguments: database The database, typically hg19 or mm9 composite The composite name, wgEncodeCshlLongRnaSeq for instance release The new release to be released redmine The Redmine issue number optional arguments: -h, --help show this help message and exit -t, --test Test mode:doesn't change status to reviewing, outputs to test qa Directory -m MDB, --mdb MDB use a different mdb composite name Example: encodeQaInit hg19 wgEncodeSydhTfbs 1 69 encodeQaInit hg18 wgEncodeHudsonalphaChipSeq 3 504
encodeQaPrepareRelease
usage: encodeQaPrepareRelease [-h] database composite stage Stages a track either to beta or to public positional arguments: database The database you're using composite The composite you're using stage The stage you are staging to optional arguments: -h, --help show this help message and exit Examples: encodeQaPrepareRelease hg19 wgEncodeSydhTfbs beta encodeQaPrepareRelease hg19 wgEncodeHaibTfbs public
encodeQaSqlRelease
Creates a pushQ entry directly in the L queue of the Main pushQ so the ENCODE track will have an entry in the release log.
usage: encodeQaSqlRelease <release.sql> <sponsor> example: encodeQaSqlRelease release.sql wong
encodeStatus.pl
Sets the ENCODE status of subIds inputted (QA usually only uses after releasing a track, setting the subIds' ENCODE status to 'released').
usage: encodeStatus [-instance=instanceName] [-force] project-id|project-name [status] valid statuses: loaded, displayed, approved, reviewing, released -instance Default instance is 'prod' -force Use if you want to set a status that is not normally allowed (e.g. to reset to an earlier status).
getTrackReferences
From list of pubMed Ids, provides html output of references in CBSE citation format.
usage getTrackReferences <pubmed_id1> <pubmed_id2> ... <pubmed_idn>
mdbPrint
Useful for checking the experiments (expIds) have been done correctly, see ENCODE QA wiki.
mdbPrint - Prints metadata objects, variables and values from 'metaDb' table. usage: mdbPrint {db} [-table=] [-byVar] [-line/-count] [-all] [-vars="var1=val1 var2=val2..."] [-obj= [-var= [-val=]]] [-var= [-val=]] [-specialHelp] Options: ...
(truncated for brevity)
qaEncodeTracks2
Runs test suite for ENCODE tracks (the library functions used by script are also run by encodeQaInit, which puts the output in the script.out file, but can still be run on its own).
usage: qaEncodeTracks2 [-h] database tableList [trackDb] A series of checks for QA positional arguments: database The database, typically hg19 or mm9 tableList The file containing a list of tables trackDb The trackDb file to check optional arguments: -h, --help show this help message and exit Examples: qaEncodeTracks2 hg19 tableList qaEncodeTracks2 hg19 tableList /path/to/trackDb.ra qaEncodeTracks2 hg19 tableList ~/kent/src/hg/makeDb/trackDb/human/hg19/wgEncodeSydhTfbs.new.ra
raDiff
Used on metaDb .ra files and cv.ra files; may be expanded to trackDb .ra files.
usage: qaRaDiff [-h] RaFileOne RaFileTwo Describes the differences between the two .ra files positional arguments: RaFileOne The .ra file RaFileTwo The .ra file to compare to optional arguments: -h, --help show this help message and exit example: qaRaDiff alpha/wgEncodeUwTfbs.ra beta/wgEncodeUwTfbs.ra
raMerge
Used on metaDb .ra files and cv.ra files; may be expanded to trackDb .ra files.
usage: raMerge [-h] [-t] RaFileOne RaFileTwo Merges two .ra files in a way that you would expect positional arguments: RaFileOne The .ra file RaFileTwo The .ra file to merge with optional arguments: -h, --help show this help message and exit -t, --trackDb Print as trackDb example: raMerge alpha/wgEncodeUwTfbs.ra beta/wgEncodeUwTfbs.ra