Skip Navigation

AllGenes

NAR Molecular Biology Database Collection entry number 26
Barkan, D., Belova, O., Brunk, B., Crabtree, J., Elisafenko, E., Fischer, S., Gan, Y., Gubina, M., Katokhin, A., Kolchanov, N., Mazzarelli, J., Nadezhda, L., Nizolenko, L., Pinney, D., Schug, J., Semjonova, E., Shilov, A., Skvortzova, T., Stoeckert, C., Trifonoff, V., Zykov, I.
Department of Genetics and the Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA, USA; Institute of Cytology and Genetics SB RAS and the State Research Center of Virology and Biotechnology VECTOR, Novosibirsk, Russia

Database Description

AllGenes.org is the home of DoTS (Database of Transcribed Sequences), a human and mouse transcript and gene index. It is generated by assembling publicly available EST and mRNA sequences into DoTS Transcripts (the transcript index) and collating those into DoTS Genes (the gene index). The genes are integrated with public genomic sequence (currently UCSC hg15 and mm3) producing DoTS Gene Models. The transcript sequences are subjected to a suite of automated annotation, including: collation into conceptual genes; BLAT alignment onto the genome; protein prediction; protein function prediction (GO assignment); protein similarity and association of description; protein motif assignments; RH marker assignments; gene trap tag sequence assignments; anatomy profile; expression profile; and, mapping to GeneCards, MGI and IMAGE. The transcripts and genes are also curated, which provides gene name and synonym assignments, and confirms automated annotation.

As of June 19, 2003, the transcript index contains 257,532 human and 163,379 mouse non-singleton DoTS Transcripts that cluster to 134,162 human and 95,762 mouse DoTS Genes. 73% of the human and 73% of the mouse transcripts have been confirmed with high quality matches to the genome. 59% of human and 60% mouse transcripts have similarity to a known protein sequence and 32% of human and 29% of mouse have been assigned a GO (Gene Ontology Consortium) function. 37% of the human genes and 36% of mouse have been assigned a DoTS Gene Model. The DoTS annotation team has manually annotated 29,703 human and 39,283 mouse DoTS Transcripts (DTs/RNAs), corresponding to 3,126 human and 6,255 mouse DoTS Genes.

DoTS is built on the GUS genomics database platform developed by our group (http://www.gusdb.org). The GUS relational schema is an extensive genomics warehouse organized around the central dogma of biology (genes are transcribed to RNA which are translated to proteins). It enables powerful queries not available in many other genomics databases. The AllGenes web interface also uses GUS's boolean query and query history facilities which allow users to compose sophisticated queries built from more basic queries. A sample query finds all mouse RNAs located on chromosome 7 that are expressed in the brain whose products are predicted to be transcription factors.

Recent Developments

The AllGenes site features the DoTS "ReportMaker" tool that allows users to download query result sets in a customizable tab-delimited format. DoTS Transcripts can now be queried directly by physical location on the genome. DoTS Gene Models are included in tracks at the Ensembl Human and Mouse ContigView genome browsers and in the UCSC Genome Browser. AllGenes now includes queries by LocusLink and Affymetrix ID. The signal peptide query allows users to retrieve human and mouse DoTS Transcripts with a predicted signal peptide, and the TM domain query allows one to find DoTS Transcripts with a specific number of predicted transmembrane domains. and links to the GNF's Gene Expression Atlas (Su et al. 2002).

Acknowledgements

This work was supported by grants the National Institutes of Health (R01HG01539) and the Department of Energy (DE-FG02-DOE00ER62893)

References

Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB (2002) Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. USA. 99:4465-4470.


Oxford University Press is not responsible for the content of external internet sites