Crispr QA: Difference between revisions

From Genecats
Jump to navigationJump to search
(First pass at Crispr track QA page)
 
 
(6 intermediate revisions by one other user not shown)
Line 19: Line 19:


==Special notes about normal QA checks==
==Special notes about normal QA checks==
* crisprRanges and locusName follow the normal track QA checklist.
* These tables should be in tablesIgnored for all.joiner
* These tables should be in tablesIgnored for all.joiner
* countPerChrom and runBits:
* countPerChrom and runBits:
* compare crisprRanges to the gene track it was made from. They should be somewhat similar.
** compare crisprRanges to the gene track it was made from. They should be somewhat similar.
** The crisprTargets track is a bigBed, so I like to make a bed out of the bigBed, run it through bedSingleCover, and then call featureBits on it comparing it to the crisprRanges table, they should be fairly similar:
** The crisprTargets track is a bigBed, so I like to make a bed out of the bigBed, run it through bedSingleCover, and then call featureBits on it comparing it to the crisprRanges table, they should be fairly similar:<pre>$ featureBits db crisprRanges targets.singleCover.bed</pre>
<pre>$ featureBits db crisprRanges targets.singleCover.bed</pre>
** Neither track should overlap with the gap track
* Neither track should overlap with the gap track
**  crisprTargets should not fall outside crisprRanges, since crisprTargets is all the guides in the range of crisprRanges, see below on how to check.
*  crisprTargets should not fall outside crisprRanges, since crisprTargets is all the guides in the range of crisprRanges, see below on how to check.
* all details check is slightly different, see below.
* all details check is slightly different, see below.


==Special QA notes==
==Special QA notes==
* To check the coordinates of the bigBed file, what you can do is first check the table coords of the crisprRanges table with checkTableCoords, when that comes back ok, make a bed file of the crisprRanges, and then compare it to the crisprTargets.bed file with bedtools intersect:
# To check the coordinates of the bigBed file, what you can do is first check the table coords of the crisprRanges table with checkTableCoords, when that comes back ok, make a bed file of the crisprRanges, and then compare it to the crisprTargets.bed file with bedtools intersect:<pre>/cluster/bin/bedtools/bedtools intersect -v -a mm10.crispr10K.bed -b mm10.crispr10KRanges.bed</pre> This intersects the two files, and only outputs crisprTargets that are not in crisprRanges (ie: bad coords).
<pre>/cluster/bin/bedtools/bedtools intersect -v -a mm10.crispr10K.bed -b mm10.crispr10KRanges.bed</pre>
# Look for interestingly colored items (perhaps items with a color other than those described on the description page??):<pre>awk -F '\t' '{print $9}' mm10.crispr10K.bed | sort -u > crispr10KColors</pre>
This intersects the two files, and only outputs crisprTargets that are not in crisprRanges (ie: bad coords).
# The coloring scheme is described in the description page. A few guides of each type should be checked. To get a list of all the color schemes present, and how often each occurs, the following command can be used: <pre>awk -F$'\t' '{print $9}' mm10.crispr10K.bed | sort | uniq -c > mm10.crispr10K.color.profile</pre>
* The coloring scheme is described in the description page. A few guides of each type should be checked. From a bed file of crisprTargets, you can extract the guides of each type like so:
# To get the number of items of each color: <pre>wc -l *Items.bed</pre>
<pre>awk -F '\t' '{$9 == "0,200,0") print $0}' crisprTargets.bed > greenItems.bed</pre>
# To mimic countPerChrom measurements: <pre>for chrom in {1..19} X Y M; do echo chr$chrom;  awk -v chr=$chrom -F'\t' '{if ($1 == "chr"chr) {sum+=1}} END {print sum}' mm10.crispr10K.bed; done > chromCounts</pre>
$9 is the normal bigBed color column, and in this case it indicates the efficiency of the guide RNA, or how well the guide will actually cut at that location.
# Take a few random lines (shuf -n 3) from each of these color files (*Items.bed) and check every aspect of the bigBed file on hgTracks and hgc:
* To get color counts:
#* check the scores are correct
<pre>for color in $(cat crispr10KColors); do printf "number of %s color:" "$color"; awk '{print $9}' mm10.crispr10K.bed | grep -c $color mm10.crispr10K.bed; done > itemColorCounts</pre>
#* check the mouseOver text in hgTracks is correct
* To mimic countPerChrom measurements:
#* check off-target counts match what's displayed in the full list. sometimes the page may say something like 1 off-target with one mismatch but there won't be any in the full list.
<pre>for chrom in {1..19} X Y M; do echo chr$chrom;  awk -v chr=$chrom -F'\t' '{if ($1 == "chr"chr) {sum+=1}} END {print sum}' mm10.crispr10K.bed; done > chromCounts</pre>
#* make sure the list of off-targets shows up. the off-target information is stored in an external file and there have been problems with the indices into it.
* Take a few random lines (shuf -n 3) from each of these color files and check every aspect of the bigBed file:
#* check a few off-targets from the table. Make sure the sequence displayed matches the sequence in the browser, and check that the locus is correct. Note that this can be confusing when a negative strand is involved on either a guide, off-target or both. Some guides have hundreds of off-targets, only 2-3 need to be checked.
** check the scores are correct
# check the speed of the track and details page. This is a large bigBed, so it's good to note any performance issues.
** check the mouseOver text in hgTracks is correct
# this track can cause problems in the TB and DI, profiling these errors further will be helpful for future use.
** check off-target counts match what's displayed in the full list. sometimes the page may say something like 1 off-target with one mismatch but there won't be any in the full list.
 
** make sure the list of off-targets shows up. the off-target information is stored in an external file and there have been problems with the indices into it.
** check a few off-targets from the table. Make sure the sequence displayed matches the sequence in the browser, and check that the locus is correct. Note that this can be confusing when a negative strand is involved on either a guide, off-target or both. Some guides have hundreds of off-targets, only 2-3 need to be checked.
* check the speed of the track and details page. This is a large bigBed, so it's good to note any performance issues.
* this track can cause problems in the TB and DI, profiling these errors further will be helpful for future use.
==Push Notes==
==Push Notes==
* when pushing the track to hgwbeta, be sure to have the correct release tags on the trackDb stanzas so hgnfs1 files don't leak out.
* when pushing the track to hgwbeta, be sure to have the correct release tags on the trackDb stanzas so hgnfs1 files don't leak out.
[[Category:Browser QA]]

Latest revision as of 19:36, 11 December 2018

In addition to following the checks listed here, be sure to follow the regular new track checklist, as well.

Background

The track consists of regions in the genome that are target-able via the Cas 9 enzyme from S. pyogenes. Target-able sequence is any 20bp sequence with an NGG motif (the Protospacer Adjacent Motif, PAM) on the 3' end, and only those found within exons + some length of flanking sequence (200bp flanking for regular track, 10kbp for crispr10K track, etc). Researchers construct RNA complementary to the 20bp guide sequence, complex it to the cas9 enzyme, and inject the complex into the cell to edit DNA near the guide location. Different microbes use different PAM sequences, different cas enzymes, etc, and so the range of editable sequence can vary, we are just showing one example.

The following sites contain good histories/explanation:
https://en.wikipedia.org/wiki/CRISPR
http://science.sciencemag.org/content/346/6213/1258096.full

Table/File setup

The Crispr track (and it's variations Crispr 10K, CrisprKmers, etc) all consist of 3 tables and 2 files: MySQL tables:

  • crisprRanges (or crispr10KRanges, etc) - a simple bed3 table which contains the regions surveyed for guides
  • crisprTargets - a one-row table pointing to a bigBed file (/gbdb/$db/crispr/crispr.bb)
  • locusName - table describing each base of a given genome, and whether that base (or sequence of bases) is in an exon, intron, intergenic, etc

GBDB Files:

  • /gbdb/$db/crispr/crispr.bb - the meat of the track, this huge file contains all of the 23 bp Crispr/cas9 guide sequences
  • /gbdb/$db/crispr/crisprDetails.tab - an even huger file describing the off-target locations of each of the guides in the bigBed

Special notes about normal QA checks

  • crisprRanges and locusName follow the normal track QA checklist.
  • These tables should be in tablesIgnored for all.joiner
  • countPerChrom and runBits:
    • compare crisprRanges to the gene track it was made from. They should be somewhat similar.
    • The crisprTargets track is a bigBed, so I like to make a bed out of the bigBed, run it through bedSingleCover, and then call featureBits on it comparing it to the crisprRanges table, they should be fairly similar:
      $ featureBits db crisprRanges targets.singleCover.bed
    • Neither track should overlap with the gap track
    • crisprTargets should not fall outside crisprRanges, since crisprTargets is all the guides in the range of crisprRanges, see below on how to check.
  • all details check is slightly different, see below.

Special QA notes

  1. To check the coordinates of the bigBed file, what you can do is first check the table coords of the crisprRanges table with checkTableCoords, when that comes back ok, make a bed file of the crisprRanges, and then compare it to the crisprTargets.bed file with bedtools intersect:
    /cluster/bin/bedtools/bedtools intersect -v -a mm10.crispr10K.bed -b mm10.crispr10KRanges.bed
    This intersects the two files, and only outputs crisprTargets that are not in crisprRanges (ie: bad coords).
  2. Look for interestingly colored items (perhaps items with a color other than those described on the description page??):
    awk -F '\t' '{print $9}' mm10.crispr10K.bed | sort -u > crispr10KColors
  3. The coloring scheme is described in the description page. A few guides of each type should be checked. To get a list of all the color schemes present, and how often each occurs, the following command can be used:
    awk -F$'\t' '{print $9}' mm10.crispr10K.bed | sort | uniq -c > mm10.crispr10K.color.profile
  4. To get the number of items of each color:
    wc -l *Items.bed
  5. To mimic countPerChrom measurements:
    for chrom in {1..19} X Y M; do echo chr$chrom;  awk -v chr=$chrom -F'\t' '{if ($1 == "chr"chr) {sum+=1}} END {print sum}' mm10.crispr10K.bed; done > chromCounts
  6. Take a few random lines (shuf -n 3) from each of these color files (*Items.bed) and check every aspect of the bigBed file on hgTracks and hgc:
    • check the scores are correct
    • check the mouseOver text in hgTracks is correct
    • check off-target counts match what's displayed in the full list. sometimes the page may say something like 1 off-target with one mismatch but there won't be any in the full list.
    • make sure the list of off-targets shows up. the off-target information is stored in an external file and there have been problems with the indices into it.
    • check a few off-targets from the table. Make sure the sequence displayed matches the sequence in the browser, and check that the locus is correct. Note that this can be confusing when a negative strand is involved on either a guide, off-target or both. Some guides have hundreds of off-targets, only 2-3 need to be checked.
  7. check the speed of the track and details page. This is a large bigBed, so it's good to note any performance issues.
  8. this track can cause problems in the TB and DI, profiling these errors further will be helpful for future use.

Push Notes

  • when pushing the track to hgwbeta, be sure to have the correct release tags on the trackDb stanzas so hgnfs1 files don't leak out.