|
|
(44 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| ==Getting Started==
| | This page is no longer maintained. |
| * Choose a track from the encodePushQ (sub pushQ accessed from pushQ Gateway)
| |
| **email Kate when you claim a track (cc Katrina)
| |
| * Check out the developer's notes file to get a feel for what the track consists of.
| |
| ** the path to the notes file will be in the "Notes" section of the pushQ entry
| |
| ** trust the notes file over the pushQ entry table/files information
| |
| * If this is a subsequent release, see [[#Subsequent Release of Data (e.g. Release 2)]] first.
| |
| | |
| ==run qaEncodeTracks.csh==
| |
| You will need to dump the list of tables (just the new tables if this is a releaseN) from the pushQ (or developer's notes file if this a releaseN) to a file (i.e. tableList in the usage statement). Then run qaEncodeTracks.csh, which does:
| |
| * countPerChrom
| |
| * check for entry in tableDescriptions table
| |
| * check that shortLabel does not exceed 17 characters
| |
| * check that longLabel does not exceed 80 characters
| |
| * check that there are no underscores in the table names
| |
| * check for indices on the tables
| |
| * check that positional tables are sorted
| |
| * checkTableCoords (checks for any illegal coordinates)
| |
| | |
| ==Staging on hgwbeta==
| |
| #Make a list of all tables (new & updated that need to be pushed to beta)
| |
| #In trackDb, change 'release alpha' lines to 'release alpha,beta' lines and 'release beta,public' to 'release public' and then check in these changes.
| |
| #*A quick way to replace these line in vi is ":#,## s/release alpha/release alpha,beta/" where # = from start line and ## = to ending line
| |
| #Do bigPush.csh using list created above
| |
| #Push any new /gbdb files (e.g. .wib or .bb files) from hgwdev to hgnfs1 if applicable
| |
| #On hgwbeta in trackDb: make beta DBS=dbName
| |
| | |
| ==Other things to check by hand==
| |
| # Run genePredCheck/pslCheck if applicable. (i.e. if your track is a gene prediction track)
| |
| # make sure there is a link to the help doc (in the config section: "Select views (help)")
| |
| # check that metadata is present by clicking on "..." link in tables list on details page
| |
| # read description page
| |
| #* is it detailed enough, especially Methods
| |
| #* are the citations in correct format
| |
| #* does the "Display Conventions and Configuration" section cover all track types
| |
| #* test all hyper-links
| |
| #* releaseN tracks should contain a section called "Release Notes" which should state the release# and provide a description of the changes in that particular release. See [http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeUwDnaseSeq this page] for an example.
| |
| #* Check for lab contact (sanitize email addresses using encodeEmail.pl script)
| |
| # Release log (look in PushQ): must start with "ENCODE", usually it is just be the shortlabel, if it is a subsequent release it should have "(release #)". If there is something weird about the data that needs to be noted, make sure it fits in nicely with the current release log entries. (url should be of the format: ../../cgi-bin/hgTrackUi?db=hg18&g=wgEncodeAffyRnaChip )
| |
| # configuration section (does it work?)
| |
| #* Check the views are working & that the settings work
| |
| #* Here are some additional specific guidelines when checking the Signal track default settings:
| |
| #** auto-scale shouldn't be used unless a lab insists; should be a fixed range
| |
| #** check signal in dense view for the whole chrom to make sure the fixed range allows for nice pattern of dark bands (we don't want to see all light gray across), wrangler should fix if need be
| |
| #** if there are multiple signal tracks in a track, their settings should be independent
| |
| # multi-view config: matrix etc.
| |
| #* By default, matrix boxes should be fully checked or fully unchecked (not grayed), if not, this is trackDb setting issue the wrangler should fix.
| |
| # check "reset to defaults" button (does it work)
| |
| # Make sure there's a link to the ENCODE Data Release Policy (at the bottom of the description page).
| |
| # Make sure the tracks in the Tier1 and Tier2 Cell Lines are properly colored (no black, and all tracks from one Cell Line have the same color).
| |
| | |
| ===Testing in the Browser===
| |
| # test one point from table to view in GB (pick a point which can be obtained by clicking on "schema" from the track configuration page)
| |
| # zoom into base level (at different visibilities)
| |
| # zoom way out 1million bps (at different visibilities)
| |
| # searching: should items be searchable
| |
| # default visibility: should this track be on by default?
| |
| | |
| ===Does the data make Sense?===
| |
| # Compare related subtracks of related Views to each other. For example:
| |
| #* Does the All Signal Raw Signal really seem to comprise of the data in the Plus Raw Signal & Minus Raw Signal?
| |
| #* Do Peaks really represent the high Signal areas of the Signal View subtracks?
| |
| # Do the data make sense Biologically? Turn on other tracks to compare. For example:
| |
| #* RNA-seq data should correlate with the exons in a genes track
| |
| #* TFBS tracks should correlate with the beginning of gene transcripts
| |
| | |
| ===Performance Tests===
| |
| # Does the first 'Signal' subtrack pass the chr1 test (chr1 loads in less than one minute)
| |
| # Do all views for one experiment pass chr1 test (e.g. Pol2 in GM12878 cells)?
| |
| # A user-oriented test would be to test the performance in a gene-size region of the track with just the default-on subtracks (for the Yale track, and many other ENCODE tracks, default-on subtracks will be all experiments in the GM12878 cell lines, Signal view only -- this should be the configuration you see after a cart reset, then turning the overall track vis to full).
| |
| # Note that ENCODE tracks can have any number of subtracks, and will continue to grow with time. We should definitely assure that useful subsets can be displayed in user-friendly time.
| |
| | |
| ==Files==
| |
| First, a note about finding the files. One of the most time-consuming things we do is track down items that should have been placed in the "Files" section of the pushQ entries but weren't. It takes us a long time to (a) figure out what's missing, and (b) find it. If developers can ensure that both the /gbdb and /goldenPath files are there, it would be a huge help!
| |
| | |
| ===Download File specifics===
| |
| #Check the '''Index page''' (e.g. [http://hgdownload-test.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeYaleChIPseq this page ] ), which created from two files, preamble.html & index.html) in the downloads directory:
| |
| #*''preamble.html'' - the top description part of the index page (not the list of files)
| |
| #**an introductory paragraph with a brief description, and link to the trackUi.
| |
| #**should include a link to the files.txt and md5sum.txt files
| |
| #**releaseN: should also include the release number at end of description (e.g. "This is Release 2 of this track. Release notes are included in the track description.")
| |
| #**If you need to edit the preamble.html (not the list of files), follow these steps:
| |
| #**#Edit the preamble.html file (note that it is not in CVS) in the downloads directory on hgwdev
| |
| #**#Regenerate index.html by running the script: encodeDownloadsPage.pl index.html (I think the preamble.html needs to be in the dir where you run the script)
| |
| #**#Look at results on genome-test. If necessary, go back to step 1.
| |
| #*''index.html'' - the list of files on the index page (not the description on top)
| |
| #**should contain the name of each file being released (and only the name of those files in this release). A good way to spot check this is to make sure the number of files at the bottom of the list is correct.
| |
| #***new track: use the PushQ file list (shouldn't be files with V2 or V3, etc.) and your best judgment to determine that all the appropriate files (and only those files are listed).
| |
| #***releaseN: make sure the right # of files are there. Check some of the removed files to make sure they were in fact removed from the list. If the list doesn't seem right, run the encodeDownloadsPage.pl script (at the prompt type: encodeDownloadsPage.pl index.html) directly in the /releaseN directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/releaseN) to generate a new index.html page that you know contains all the files in that directory. Then, copy the /releaseN/index.html to the main track directory (hgwdev: /kent/src/hg/htdocs/goldenPath/<db>/encodeDCC/<trackName>/) as that is the index.html used on the site. Don't forget to commit your changes.
| |
| #**make sure this file is executable (because of the way the links are created)
| |
| #Downloads directory also needs to contain the following:
| |
| #*files.txt - plaintext version of index.html; lists files with metadata
| |
| #*md5sum.txt - checksum of all files in download directory
| |
| #When you are ready to release make sure your track is listed on the [http://genome.ucsc.edu/ENCODE/downloads.html downloads page] - if it isn't listed, go /kent/src/hg/htdocs/ENCODE/downloads.html to add a line for your track, and push the following from hgwdev -> hgwbeta, RR:
| |
| /usr/local/apache/htdocs-hgdownload/ENCODE/downloads.html
| |
| | |
| ===Pushing Files===
| |
| Pushing the three main types of files involved in ENCODE tracks.
| |
| | |
| *gbdb Files
| |
| Files of this form get pushed hgwdev -> hgnfs1
| |
| /gbdb/hg18/wib/wgEncode*.wib
| |
| | |
| *Other Files
| |
| Files of this form get pushed hgwdev -> hgwbeta & RR (they are quite often accidentally omitted from the pushQ entry -- you will need these types of protocol PDF files, if this is the first subtrack released for this cell line from this lab)
| |
| /usr/local/apache/htdocs/ENCODE/protocols/cell/*.pdf
| |
| | |
| *Download Files
| |
| Download files for an original release get pushed hgdownload-test -> hgdownload (list the entire file path as usual)
| |
| /usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/index.html
| |
| /usr/local/apache/htdocs-hgdownload/goldenPath/hg18/encodeDCC/wgEncode*/wgEncode*.[bed/wig].gz
| |
| | |
| When pushing download files for a subsequent release track (e.g. release 2), push files as follows (but in your request, list the from/to paths at the top followed by a list of the file names without the full path)
| |
| | |
| from '''hgwdev''': /usr/local/apache/htdocs-hgdownload/goldenPath/<db>/encodeDCC/<trackName>/releaseN/
| |
| to '''hgdownload''': /usr/local/apache/htdocs/goldenPath/<db>/encodeDCC/<trackName>/
| |
| (''Note'': no releaseN directory on '''hgdownload''')
| |
| | |
| * After Pushing Files:
| |
| Once the files have been pushed you can check to see if the push was successful using this script:
| |
| checkPushedFiles.csh
| |
| | |
| ===validateFiles===
| |
| *''No longer run, here are Tim's comments about QA running validateFiles:'' "To get these things through the pipeline, we run them through validateFiles, so I think your running them through again is one time too many. But if you are going to, then each lab and each file type may have negotiated limits (which may change between submissions). These limits are found in the relevant submission directory DAF files."
| |
| * Old validateFiles process:
| |
| Test a smattering of different file types using this tool: '''validateFiles''' (type the program name without arguments to see the usage statement). If there are no errors, there will be no output. For example, for files of type tagAlign, invoke the tool like this:
| |
| validateFiles -type=tagAlign -genome=/gbdb/hg18/hg18.2bit /usr/local/apache/htdocs/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq/wgEncodeHudsonalphaChipSeqAlignmentsRep1Gm12878Control.tagAlign.gz
| |
| | |
| For tagAligns there are several relevant validateFiles options:
| |
| mismatches - frequently 2 but negotiated for each lab. Set this to 5 to be tolerant
| |
| matchFirst - negotiated. You should set this to 25 and even then you may need to adjust it
| |
| nMatch - negotiated, but you should always have this parameter set.
| |
| | |
| If you want to be exact, then the metadata as seen on the downloads page tells which submission directory the file belongs to, and the most recent *.DAF (or *.daf) will have a validationSettings line in it which will include the settings that belong to each file type. Example:
| |
| /hive/groups/encode/dcc/pipeline/encpipeline_prod/773/UtaChIPseqBOonlyV1.DAF
| |
| has the line:
| |
| validationSettings allowReloads;validateFiles.tagAlign:mmCheckOnInN=100,mismatches=3,nMatch,matchFirst=25
| |
| | |
| This means that the tag aligns were validated with -mismatches=3 -nMatch -matchFirst=25
| |
| | |
| ==Subsequent Release of Data (e.g. Release 2)==
| |
| Periodically, the existing ENCODE tracks will be augmented with new data as labs complete experiments on new cell lines, etc. The new data will come in various formats: some will replace existing data, some will be brand new, some old data will be eliminated, etc. The data wrangler will create a text document, check it into CVS, and place it here: kent/src/hg/makeDb/doc/encodeDccHg18/*.txt
| |
| | |
| This document should contain complete lists of each table and file and what its disposition is. The tables and files will fall into categories similar to this:
| |
| *A) Untouched - are on public browser and should remain
| |
| *B) Deprecated - are currently on RR but will no longer be needed and should not be referenced by the public site.
| |
| **NOTE: NO FILES SHOULD BE REMOVED from the downloads directory on hgdownloads (RR).
| |
| **This list is provided for completeness. Any files marked here as in gbdb may be eliminated.
| |
| *C) New - are only currently on test but will need to be pushed to the RR.
| |
| *D) Additional items of note
| |
| | |
| This document may not match reality. It may be the case that some of the tables/files do not exist at all, the names are incorrect, they are not present on the machine as listed in the file, they do not match the list that is in the pushQ. The first challenge in QAing a subsequent ENCODE release is to determine if/how the file diverges from reality. '''To do this, compare the file to the "snapshot" of what is included in the release (and what you should QA), which can be found in the "release2" list in the downloads directory (hgwdev: /data/apache/htdocs/goldenPath/<db>/<trackName>/release<x>/). If the file differs from the downloads directory, then send that information to the data wrangler and pop the track into the B-queue while they sort it out'''. Otherwise, QA spends far too much time figuring out exactly what they are expected to QA.
| |
| | |
| Once the list is finalized, proceed with the QA work as outlined above. Note the additional steps in the [[#Files]] section for how to handle the /releaseN directory.
| |
| | |
| ==Releasing to RR==
| |
| Note: Cc the data wrangler for this track on all your pushes Cc encode@soe.ucsc.edu on your final push.
| |
| # Check release log field in PushQ...needs to start with ENCODE
| |
| # If this is a first release, skip this step and go straight to Step#4. If this is a subsequent release, do the following:
| |
| #* Remove the 'release public' block (including sub-blocks) of your track from trackDb.wgEncode.ra.
| |
| #* Remove the 'release alpha,beta' lines from the release alpha blocks (including sub-blocks), and then on parent and view-in-the-middle blocks if applicable:
| |
| #**also note: removeAlphas script may not run if your table list file has lines that begin with a tab/space (remove these in vi with :%s/^ *//)
| |
| ##cvsup
| |
| ##run removeAlphaBetas script (> to a file)
| |
| ##diff between file & trackDb.wgEncode.ra (diff file1 file2)
| |
| ##*double check # of release alpha,betas matches # of tables (diff file1 file2 | grep release | wc --lines)
| |
| ##copy file over trackDb.wgEncode.ra
| |
| ##cvs diff to check new copy against repository copy
| |
| ##If necessary, remove release alphas from the parent block and "view-in-the-middle" sub-blocks and cvs diff again
| |
| ##make alpha on db to make sure everything looks good on dev
| |
| ##commit change
| |
| ##cvs diff again to make sure they are the same
| |
| ##remove file
| |
| ##from trackDb on hgwbeta: make beta DBS=<db>
| |
| # Do a make public (from trackDb on hgwbeta) and announce it to QA
| |
| # Check track on http://hgwbeta-public.cse.ucsc.edu
| |
| # Run comparePublic.csh to check differences between trackDb_public and RR and hgwbeta.
| |
| # Push track tables from mysqlbeta -> mysqlrr (not trackDb_public yet)
| |
| # Drop tables from hgwbeta that need to be removed (being replaced by V# tables)
| |
| # Drop tables being removed from the RR
| |
| # Push trackDb_public (have admins rename to trackDb), tableDescriptions and, if items are searchable (not usually for Encode), hgFindSpec
| |
| # Push download files, index.html, files.txt and md5sum.txt (from hgwdev to hgdownload)
| |
| #* If this is a releaseN, even though there is a releaseN directory on hgwdev, do not create one on hgdownload (see the Download Files section of [[#Pushing Files]] for specifics)
| |
| # Drop .wib files that need to be dropped (from hgnfs1)
| |
| # Check the [http://genome.ucsc.edu/ENCODE/downloads.html ENCODE/downloads.html] page to see if your track is listed. If not (mostly for first releases), edit and push ENCODE/downloads.html from hgwdev -> hgwbeta & RR (a special Encode download release log)
| |
| # Push cv.ra file (only if there is a matrix)
| |
| # Click "push requested" in the pushQ record and then click "done" after verification on the RR. The transfer pushQ entry from the the L queue to the main pushQ.
| |
| | |
| [[Category:Browser QA]]
| |