Table Browser URL: Difference between revisions
No edit summary |
No edit summary |
||
Line 44: | Line 44: | ||
</pre> | </pre> | ||
And, another example demonstrating the fetch of intron sequence for | |||
a named gene: | |||
<pre> | |||
#!/bin/sh | |||
GENE_NAME=NM_003742 | |||
wget 'http://genome.ucsc.edu/cgi-bin/hgTables?clade=vertebrate&db=hg18&'\ | |||
'hgSeq.promoterSize=1000&hgSeq.cdsExon=on&hgSeq.intron=on&'\ | |||
'hgSeq.downstreamSize=1000&hgSeq.granularity=gene&hgSeq.padding5=0&'\ | |||
'hgSeq.padding3=0&hgSeq.casing=exon&hgSeq.repMasking=lower&'\ | |||
'hgta_doGenomicDna="get sequence"&'\ | |||
'hgta_fil.v.hg18.knownGene..rawLogic=AND&'\ | |||
'hgta_fil.v.hg18.knownGene..rawQuery=&'\ | |||
'hgta_fil.v.hg18.knownGene.name.dd=does&'\ | |||
'hgta_fil.v.hg18.knownGene.name.pat='${GENE_NAME}\ | |||
'&hgta_filterTable=hg18.knownGene&'\ | |||
'hgta_geneSeqType=genomic&hgta_group=genes&hgta_outputType=sequence&'\ | |||
'hgta_regionType=genome&hgta_table=knownGene&hgta_track=knownGene&'\ | |||
'org=Human' -O ${GENE_NAME}.introns.fa | |||
</pre> | |||
Some of those variables are probably unnecessary in the query. This is a maximum | |||
set. | |||
[[Category:Technical FAQ]] | [[Category:Technical FAQ]] |
Revision as of 23:43, 15 November 2006
How to create a command line script to fetch data from the table browser.
Please take note of the following notice found on the home page of the UCSC Genome Browser WEB site:
Program-driven use of this software is limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per day.
With that limitation in mind, consider the following procedure.
The trick is to use the table browser in the normal manner until it gives an example of the type of output desired.
Then use cartDump to obtain the cgi variables used by the table browser as it produced that output. Copy those cgi variables into a command line, and add the two special URL variables:
'submit=submit&hgta_doTopSubmit=1'
to trick hgTables into thinking it just got a submit button press.
With this process, you can get hgTables to produce any of its outputs with a URL fetch as in the examples here. It gets tricky if there are filters or intersections involved.
However, for extensive use of this type of function, it is most often much more convenient and efficient to simply download the actual MySQL table data from hgdownload, and use the kent source tree tools to manipulate and calculate with the actual data locally.
Here is an example of fetching genscan genes within a specified position:
#!/bin/sh POSITION="chrX:151073054-151383976" wget --progress=dot \ 'http://genome.ucsc.edu/cgi-bin/hgTables?db=hg18&hgta_compressType=none&'\ 'hgta_group=genes&hgta_outputType=gff&outGff=1&hgta_regionType=range&'\ 'hgta_table=genscan&hgta_track=genscan&org=Human&position='${POSITION}\ '&submit=submit&hgta_doTopSubmit=1' \ -O genscan.${POSITION}.gtf
And, another example demonstrating the fetch of intron sequence for a named gene:
#!/bin/sh GENE_NAME=NM_003742 wget 'http://genome.ucsc.edu/cgi-bin/hgTables?clade=vertebrate&db=hg18&'\ 'hgSeq.promoterSize=1000&hgSeq.cdsExon=on&hgSeq.intron=on&'\ 'hgSeq.downstreamSize=1000&hgSeq.granularity=gene&hgSeq.padding5=0&'\ 'hgSeq.padding3=0&hgSeq.casing=exon&hgSeq.repMasking=lower&'\ 'hgta_doGenomicDna="get sequence"&'\ 'hgta_fil.v.hg18.knownGene..rawLogic=AND&'\ 'hgta_fil.v.hg18.knownGene..rawQuery=&'\ 'hgta_fil.v.hg18.knownGene.name.dd=does&'\ 'hgta_fil.v.hg18.knownGene.name.pat='${GENE_NAME}\ '&hgta_filterTable=hg18.knownGene&'\ 'hgta_geneSeqType=genomic&hgta_group=genes&hgta_outputType=sequence&'\ 'hgta_regionType=genome&hgta_table=knownGene&hgta_track=knownGene&'\ 'org=Human' -O ${GENE_NAME}.introns.fa
Some of those variables are probably unnecessary in the query. This is a maximum set.