Gene Set Summary Statistics: Difference between revisions

Revision as of 16:34, 18 September 2007

gene sets measured

hg17 - knownGenes version 2
hg18 - knownGenes version 3
mm8 - knownGenes version 2
mm9 - knownGenes version 3

The min, max and mean measurements are per gene

summary of gene and exon counts

db	gene count	total exon count	min exon count	max exon count	mean exon count
hg17	39368	405720	1	149	10
hg18	56722	519308	1	2899	9
mm8	31863	314628	1	313	10
mm9	49220	417114	1	610	8

summary of exon size statistics

db	sum exon sizes	min exon size	max exon size	mean exon size
hg17	106839720	1	18172	263
hg18	146371091	1	36861	282
mm8	83159087	4	17497	264
mm9	117671086	1	29698	282

summary of intron size statistics

db	sum intron sizes	min intron size	max intron size	mean intron size
hg17	2223224397	6	1096450	6069
hg18	2784923600	1	1047320	6023
mm8	1476081990	9	1347550	5220
mm9	2055504784	1	1253430	5589

Top five exon count genes

db	gene name (exon count)
hg17	NM_004543 (149)	AF535142 (146)	AF535142 (146)	NM_033071 (146)	AF495910 (146)
hg18	uc001yrq.1 (2899)	uc002zvw.1 (322)	uc002umr.1 (313)	uc002stk.1 (217)	uc002umt.1 (194)
mm8	NM_011652 (313)	NM_028004 (192)	NM_007738 (118)	NM_134448 (99)	DQ067088 (99)
mm9	uc007pgj.1 (610)	uc008kfn.1 (313)	uc008kfo.1 (192)	uc008jqv.1 (157)	uc009rrh.1 (118)

Top five largest CDS extent genes

db	gene name (CDS extent size: thickEnd-thickStart)
hg17	NM_014141 (2298740)	NM_000109 (2217347)	CR749820 (2138880)	NM_004006 (2089394)	X14298 (2089394)
hg18	uc003weu.1 (2298740)	uc004ddb.1 (2217347)	uc001pak.1 (2138880)	uc004dda.1 (2089394)	uc003wqd.1 (2055833)
mm8	NM_007868 (2253366)	NM_001004357 (2238304)	NM_053011 (2055883)	AK134694 (1988713)	NM_053171 (1639258)
mm9	uc009tri.1 (2253366)	uc009bst.1 (2238325)	uc007zfr.1 (2189582)	uc008jon.1 (2055883)	uc008mpv.1 (1988713)

Top five smallest transcript genes

db	gene name (transcript size: txEnd-txStart)
hg17	AF241539 (168)	AF277175 (176)	AY459291 (240)	AY605064 (243)	AF503918 (258)
hg18	uc004buj.1 (20)	uc001dcm.1 (22)	uc001seo.1 (22)	uc001sqn.1 (22)	uc002wpa.1 (22)
mm8	AJ319753 (217)	BC107019 (231)	BC016221 (286)	NM_130876 (303)	NM_130873 (304)
mm9	uc007bma.1 (22)	uc007gmr.1 (22)	uc007khz.1 (22)	uc007pay.1 (22)	uc007qpn.1 (22)

Histogram graphs

File:Hg17-hg18.exonCount.png

Methods

From the table browser, request three different bed files for the knownGenes track:

whole gene
exons only
introns only

From those bed files, stats can be extracted

gene count from: 'wc -l wholeGene.bed'
exon count stats from:

 STATS=`ave -col=10 wholeGene.bed -tableOut | grep -v "^#"`
 MIN=`echo $STATS | cut -d' ' -f1`
 MAX=`echo $STATS | cut -d' ' -f5`
 MEAN=`echo $STATS | cut -d' ' -f6 | awk '{printf "%d", $1+0.5}'`
 COUNT=`echo $STATS | cut -d' ' -f8 | awk '{printf "%d", $1}'`

for exon or intron size stats:

 STATS=`awk '{print $3-$2}' {introns,exons}.bed \
      | ave -col=1 stdin -tableOut | grep -v "^#"`
 MIN=`echo $STATS | cut -d' ' -f1`
 MAX=`echo $STATS | cut -d' ' -f5 | awk '{printf "%d", $1}'`
 MEAN=`echo $STATS | cut -d' ' -f6 | awk '{printf "%d", $1+0.5}'`
 SUM_SIZE=`awk '{sum += $3-$2} END{printf "%d", sum}' {introns,exons}.bed`

top five exon count genes

sort -k10nr wholeGene.bed | head -5

top five CDS size genes

awk '{cdsSize=$8-$7
if (cdsSize > 0) {printf "%s\t%s\t%s\t%s\t%d\n", $1,$2,$3,$4,cdsSize}
}' wholeGene.bed | sort -k5nr | head -5

top five smallest transcript genes

awk '{size=$3-$2
if (size > 0) {printf "%s\t%s\t%s\t%s\t%d\n", $1,$2,$3,$4,size}
}' wholeGene.bed | sort -k5n | head -5

@@ Line 138: / Line 138: @@
 ==Histogram graphs==
 [[Image:hg17-hg18.exonCount.png]]
+[[Image:Mm8_mm9_exonsTo300.png]]
 ==Methods==

Gene Set Summary Statistics: Difference between revisions

Revision as of 16:34, 18 September 2007

Contents

gene sets measured

summary of gene and exon counts

summary of exon size statistics

summary of intron size statistics

Top five exon count genes

Top five largest CDS extent genes

Top five smallest transcript genes

Histogram graphs

Methods

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools