Genome completion status
Which metazoan species currently have genomic data available? Hard to say ... it is a difficult process to track:
Sequencing centers post raw trace reads on a day-by-day basis at NCBI's trace archives. NCBI performs some quality control and adds them to the accruing database that is blastn accessible. Later the center may assemble them into contigs and post them to the "wgs" division of GenBank (more rarely at at "gss" or "htgs"). Depending on the coverage and finishing effort, these contigs can be hosted as a genome by a browser center.
It may take 2-3 years for data to complete its migration from trace sequencing to contigs to genome. More rarely, traces are withheld and a genome assembly appears abruptly, as with elephantfish. There are no announcements, maintained lists, or publications; centers rarely update their websites or indicate specific future plans.
Further complications include two orang, two gibbons, subspecies confusion with gorilla and gibbon, multiple individual human genomes, and so forth. NISC and seq centers did not work from same individual or even subspecies so each trace compilation has to be checked separately.
Consequently few other researchers are aware of what species have available genomic data, and so undersample taxonomically when doing comparative genomics projects. Often sampling more densely overturns working hypotheses of feature evolution.
To annotate at kbp scales (adequate for exons and small genes), one can reliably use the traces or contigs and not wait (years) for a genome browser to appear.
if the exon or feature is 1000 bp (or in such sized pieces), the trace archives work quite well especially for establishing presence. Absence is not asinformative because data might simply be missing due to low coverage. No vertebrae species is truly complete yet including human. it requires a couple million traces before a given incoming genome is worth checking for a given feature.
It is important to be aware that not every trace makes it into a contigs or assembly -- singletons are often omitted, millions of them. Sequencing often continues after a release, as is happening now with elephant and guinea pig.
Thus if a feature is missing from assembled traces, it is best to go back to the trace archive because all original data is there. Be aware that the trace blast database can lag trace inputs from the centers by a week, which can amount to a million traces in an active project.
There are also "cdna" species like the third marsupial, Trichosurus vulpecular, which have rather complete coverage of coding genes. These can also furnish critical close-in queries to improve the sensitivity of trace blast.
In a specific comparative genomics research project, it is important to document which species were considered. For that it is convenient to enter annotation data in a column next to its species, in a spreadsheet containing all species with genomic data (provided below and illustrated below that with a concrete coding indel example). This allows last-minute updating prior to paper submission.
Finally, PCR can be used on species currently lacking genomic or cdna projects when it is critical to augment sampling density. Flying lemur would be a good choice in primate oriented projects because it appears to be the immediate outgroup (hence a great improvement over distant mouse).
homSap Mar06 Homo sapiens human panTro Mar06 Pan troglodytes chimp gorGor Dec07 Gorilla gorilla gorilla ponPyg Htg06 Pongo pygmaeus orang_sumatran nomLeu Trc07 Nomascus leucogenys gibbon macMul Jan06 Macaca mulatta rhesus calJac Wgs06 Callithrix jacchus marmoset_nwm tarSyr ???07 Tarsius syrichta tarsier otoGar Dec06 Otolemur garnettii bushbaby micMur Trc07 Microcebus murinus mouse_lemur cynVol ???07 Cynocephalus volans flying_lemur tupBel Dec06 Tupaia belangeri tree_shrew musMus Feb06 Mus musculus mouse ratNor Nov04 Rattus norvegicus rat speTri Wgs06 Spermophilus tridecemlineatus ground_squirrel cavPor Wgs06 Cavia porcellis guinea_pig oryCun May05 Oryctolagus cuniculus rabbit canFam May05 Canis familiarus dog felCat Wgs06 Felis catus cat equCab Jan07 Equus caballus horse myoLuc Wgs06 Myotis lucifugus microbat pteVam Trc06 Pteropus vampyrus macrobat bosTau Mar05 Bos taurus cow susScr Trc06 Sus scrofa pig sorAra Wgs06 Sorex araneus shrew eriEur Wgs06 Erinaceus europaeus hedgehog dasNov May05 Dasypus novemcinctus armadillo choHof Trc06 Choloepus hoffmanni sloth loxAfr May05 Loxodonta africana elephant proCap Wgs06 Procavia capensis hyrax echTel Jul05 Echinops telfairi tenrec monDom Jan06 Monodelphis domestica opossum macEug Trc06 Macropus eugenii wallaby ornAna Mar07 Ornithorhynchus anatinus platypus galGal May06 Gallus gallus chicken taeGut Trc06 Taeniopygia guttata finch anoCar Feb07 Anolis carolinensis lizard xenTro Jun06 Xenopus tropicalis clawed_frog danRer Mar06 Danio rerio zebrafish gasAcu Feb06 Gasterosteus aculeatus stickleback oryLat Apr06 Oryzias latipes rice fish takRub Oct04 Takifugu rubripes fugu tetNig Feb04 Tetraodon nigroviridis puffer calMil Wgs07 Callorhinchus milii elephantfish petMar Trc06 Petromyzon marinus lamprey coding indel example: SPC25 chr2 exon homSap MVEDELALFDKSINEFWNKFKST--DTSCQMAGLRDTYKDSIKAFA panTro MVEDELALFDKSINEFWNKFKST--DTSCQMAGLRDTYKDSIKAFA ponPyg MVEDELALFDKSINEFWNKFKST--DTSCQMAGLRDTYKDSIKAFA macMul MVEDELALFDKSINEFWNKFKST--DTSCQMAGLRDTYKDSIKAFA calJac MVEDELALFDKSLNEFWNKFKST--DTTFQMAGLRDTYKDSLKAFA tarSyr MVEDELTLFDKSINEFWNKFKST--DTANQMMGLRDTYKDSVKAFA otoGar MVEDQLALLDKNINEFWNKFKST--DTAGQMAGLRDTYKDSIKTFA micMur MVEDELVLFDKTVNEFWNKFKST--DTSCHMVGLRDTYKDSLKAFA cynVol .................NKFTST--DTSCQMMGLRGTNK....... tupBel MVEDELALFDKGINEFWNKFRSTVSDTSCQMVGLRDAYKDSIKAFA musMus MGEDELALLNQSINEFGDKFRNRLDDNHSQVLGLRDAFKDSMKAFS ratNor MGEDELAAFEKSINEFGDKFRYRLSDNRSQVLGLKDAFKDSIRALS cavPor MVEDELALFDKSINEFGNKFRNTLSDTPCQMLGLRDACKDSIKTLA speTri MMEDELARFDKSINEFGNKFRNTFSDTRCQMVGLRDVFKDSIEALA dipOrd MVEDELAHFDKSISEFGSKFRNTLSDTPSQTVGLRDAYKDSIKALS oryCun MVEDELALFDKSINEFGSKFRSTLSDAPCQMVGLRDAYKDSVKSLT ochPri MVEDELALFDKSINEFGSKFRSTLSDTPCQMVGLREACKDSVRLLT canFam MIDDELAQFDKSISEFWSKFKGTVSDTSSQMVGLRETYKDSIKACA felCat MIEDELALFDKSINEFWNKFKSTLSDTSCQMMGLRDTYKDSIKALT equCab MVEDELALFDKSINEFWNKFKNTVSDTSCQMVGLRDAYKDSIKAFA myoLuc MVEDELALLDKNINEFWNKFKSNVNDTSCQMVGLRDNYKDISKAFT pteVam MVEDELALLDKSINEFWNKFKSSVSDTSCQMMALRDSYKDINKAFT bosTau MVEDELALFDKSINEFWNKFKSTVSDTSCQMVGLRETYKDSIKAFA turTru MVEDELALFDKSINEFWNKFRSTVSDTSCQMVGLRDTYKDSIKAFA susScr MVEDELALFDKSINEFWNRFKSTVSDTSCQMVGLRENYKDSLKAFA oviAri MVEDELALFDKSLNEFWNKFKSTVNDTSCQMVGLREAYKDSIKAFA eriEur MVEDELALFDKSINEFWNKFKGTVSDTSFQMVGLRDTYKDSIKIFT sorAra MVEDELVLFEKSINEFVNEFESTASDTTCQVVGPRDADKDSIKALA dasNov MIEDELALFDKSINEFWNKFKGTVSDNSCQMVGLRDTYKDSIKAFA choHof MIEDELALFDKSINEFWNKFKSAVSDTSCQMVGLRDTYKDSIKAFA loxAfr MIEDELVQFDKSINEFWNKFINTASDTSCQMVGLRDAYKDSMKAFA proCap MIEDELRQFDKSINEFWNKFINTTSDTSCQMAGLRDAYKDSMKAFA echTel MIEDELLQFDKSMNEFRNKHFNTLNDTSGQMMGLRDTYRDSMKAFA monDom MSHIKTEEELDLFNKSINDFWNKFRNTTLNEHCSQMVGLRDTYKDSIEALT macEug MSHIKTEEELDIFEKSISDFWNRFRNTAFNEPYSQVVGVRDTYKYSIETLT triVul MSHIKTEEELDIFNKSINDFWNRFRNTTFNEHYSQVVGLRDTYKNSIEALT ornAna MSHIKTEEELALFDKSIDEFWTKFKNTWISEYSCQTVTLRDAHKEAIKALT galGal MSAVKTEDEITVVEREMKEFWTELKSVYGTEQINQTLALRDSCKESINVLS taeGut MGNAQAEDEVALFEKDMKEFWIQFKISYGTEQNNQTMKEFWIQFKISYGTE anoCar MAKAKEEDELTMLEKGIEELCTQIETTYCRQSLEKTSGPRNKCYKSGPRNK
--tom