GI2012TrackHubsPoster: Difference between revisions
Brianraney (talk | contribs) No edit summary |
Brianraney (talk | contribs) No edit summary |
||
Line 5: | Line 5: | ||
* Files: [[Media:Hinrichs_BoG2012_GBVariants.pptx|.pptx]], [[Media:Hinrichs_BoG2012_GBVariants.pdf|PDF]] | * Files: [[Media:Hinrichs_BoG2012_GBVariants.pptx|.pptx]], [[Media:Hinrichs_BoG2012_GBVariants.pdf|PDF]] | ||
==Parallel downloading, remote access, and caching make for good performance== | |||
The technology behind the data hubs, and their optimization, is what enables good performance in the browser. Threads allow hub tracks to be downloaded in parallel, avoiding an otherwise large serial internet latency that tends to increase both with distance to the remote hub and with the number of hub tracks visible. The performance of the cache becomes very important once the data has been fetched. Optimizations include a read-ahead buffer for the cache data that speeds up reading by a factor of 30. Browser users typically visit a few favorite genes, and only those small pieces of the large remote data file are fetched. The cache contains a sparse data file as well as a bitmap file that keeps track of cached blocks. Unix stores files as sparse files, which take up no space for unwritten parts of the file. This results in great disk-space efficiency in the cache. Temporarily inaccessible data hub resources are handled with reasonable timeouts and informative error messages. | |||
==How to get help== | ==How to get help== |
Revision as of 21:36, 27 August 2012
This page contains links related to the UCSC Genome Browser poster presented by Brian Raney at Genome Informatics 2012 [1])
Poster: Remote Data Track Storage for Viewing on the UCSC Genome Browser
Parallel downloading, remote access, and caching make for good performance
The technology behind the data hubs, and their optimization, is what enables good performance in the browser. Threads allow hub tracks to be downloaded in parallel, avoiding an otherwise large serial internet latency that tends to increase both with distance to the remote hub and with the number of hub tracks visible. The performance of the cache becomes very important once the data has been fetched. Optimizations include a read-ahead buffer for the cache data that speeds up reading by a factor of 30. Browser users typically visit a few favorite genes, and only those small pieces of the large remote data file are fetched. The cache contains a sparse data file as well as a bitmap file that keeps track of cached blocks. Unix stores files as sparse files, which take up no space for unwritten parts of the file. This results in great disk-space efficiency in the cache. Temporarily inaccessible data hub resources are handled with reasonable timeouts and informative error messages.
How to get help
- Search for answers in our mail list archives: http://genome.ucsc.edu/contacts.html
- Email a new question to our actively monitored list genome@soe.ucsc.edu
- OpenHelix's free training materials: http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml
Other posters about the UCSC Genome Browser
- Visually integrating genomic data in the UCSC Genome Browser. Hinrichs AS et al. HGV 2011 genomewiki page .pptx, PDF
- UCSC Genome Browser Data Hubs. Zweig AS et al. Biology of Genomes, 2011 PDF
- Genome-wide ENCODE Data at UCSC. Rosenbloom KR et al. ASHG, 2010. PPT
- UCSC Genome Browser Tool Suite. Hinrichs AS et al. Genomics of Common Disease, 2008: .ppt, PDF