Skip Navigation

InterPro

NAR Molecular Biology Database Collection entry number 207
Apweiler R.1, Attwood T.K.4, Bairoch A.2, Bateman A.5, Binns D.1, Bradley P.1,4, Bork P.8, Bucher P.3, Cerutti L.3, Copley R.13, Courcelle E.6, Das U.1, Durbin R.5, Fleischmann W.1, Gough J.11, Gouzy J.6, Griffiths-Jones S.5, Haft D.9, Harte N.1, Hulo N.2, Kahn D.6, Kanapin A.1, Krestyaninova M.1, Lonsdale D.1, Lopez R.1, Letunic I.8, Madera M.12, Maslen J.1, McDowall J.1, Mulder N.1, Nikolskaya A.N.10, Orchard S.1, Pagni M.3, Peyruc D.6, Ponting, C.7, Quevillon E.1, Servant F.1, Sigrist C.2, Studholme D.J.5, Vaughan R.1 and Wu C.H.10

1EMBL Outstation - The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
2Swiss Institute for Bioinformatics, Geneva, Switzerland
3Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland
4School of Biological Sciences, The University of Manchester, Manchester, UK
5The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
6CNRS/INRA, Toulouse, France
7MRC Functional Genetics Unit, Department of Human Anatomy and Genetics, University of Oxford, UK
8Biocomputing Unit, EMBL-Heidelberg, Germany
9The Institute for Genomic Research, Maryland, USA
10Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
11Genomic Sciences Centre, RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Japan
12MRC Laboratory of Molecular Biology, Cambridge, UK
13Wellcome Trust Centre for Human Genetics, Oxford, UK

Database Description

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 to amalgamate the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIR SuperFamily and, the structure-based SUPERFAMILY have been manually integrated and are available in InterPro for text- and sequence-based searching. CATH and PANTHER HMMs will soon be integrated. The results are provided in a single, comprehensive format, with links to the original data sources, as well as specialised functional databases. The latest release of InterPro contains over 10,000 entries, with 78% coverage of all proteins in UniProt. Each entry has annotation provided in the name, GO mapping and abstract fields, and all matches against the Swiss-Prot and TrEMBL components of UniProt are precomputed and available for viewing in different formats. Protein 3D structural information is integrated from MSD, CATH and SCOP, and this data is available in the match views to provide an at a glance comparison of sequence and structural domains. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). InterProScan provides a sequence search package that can be used via a web interface or can be installed locally for bulk searches.

Recent Developments

New features of the database include improved match views and a taxonomy servlet. The match views now include both extended and compact views that can be ordered by protein accession number, name, taxonomy or by proteins of known structure. The InterPro Domain Architectures view is a graphical representation of protein domain architecture, where the domain architecture of a protein sequence is displayed as a series of non-overlapping domains. This provides a means of viewing and displaying protein domain compositions. The taxonomic range of proteins matching each InterPro entry is displayed in a new field. The number of proteins matching each taxonomic group links to the graphical view of that subset of proteins. The first HMMs from the CATH database, which bases its entries on structural superfamilies from CATH, have been integrated, and PANTHER is the next database awaiting integration.

Acknowledgements

The InterPro project is supported by the ProFuSe grant (QLG2-CT-2000-00517) of the European Commission.

References

1. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2004) SCOP database in 2004: refinements integrate structure and sequence family. Nucleic Acids Research 32(1), D226-229.
2. Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N. and Yeh, L.S. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32(1), D115-119.
3. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A. and Zygouri, C. (2003) PRINTS and its automatic supplement pre-PRINTS. Nucleic Acids Research 31(1), 400-402.
4. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C. and Eddy, S.R. (2004) The Pfam protein families database. Nucleic Acids Research 32(1), D138-141.
5. Biswas, M., O´Rourke, J.F., Camon, E., Fraser, G., Kanapin, A., Karavidopoulou, Y., Kersey, P., Kriventseva, E., Mittard, V., Mulder, N., Phan, I., Servant, F. and Apweiler, R. (2002) Applications of InterPro in protein annotation and genome analysis. Briefings in Bioinformatics 3(3), 285-295.
6. Golovin A, Oldfield TJ, Tate JG, Velankar S, Barton GJ, Boutselakis H, Dimitropoulos D, Fillon J, Hussain, A., Ionides, J.M., John, M., Keller, P.A., Krissinel, E., McNeil, P., Naim, A., Newman, R., Pajon, A., Pineda, J., Rachedi, A., Copeland, J., Sitnov, A., Sobhany, S., Suarez-Uruena, A., Swaminathan, G.J., Tagari, M., Tromm, S., Vranken, W. and Henrick, K. (2004) E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Research 32(1), 211-216.
7. Haft, D.H., Selengut, J.D. and White, O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Research 31, 371-373.
8. Harris, M.A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., Richter, J., Rubin, G.M., Blake, J.A., Bult, C., Dolan, M., Drabkin, H., Eppig, J.T., Hill, D.P., Ni, L., Ringwald, M., Balakrishnan, R., Cherry, J.M., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S., Fisk, D.G., Hirschman, J.E., Hong, E.L., Nash, R.S., Sethuraman, A., Theesfeld, C.L., Botstein, D., Dolinski, K., Feierbach, B., Berardini, T., Mundodi, S., Rhee, S.Y., Apweiler, R., Barrell, D., Camon, E., Dimmer, E., Lee, V., Chisholm, R., Gaudet, P., Kibbe, W., Kishore, R., Schwarz, E.M., Sternberg, P., Gwinn, M., Hannick, L., Wortman, J., Berriman, M., Wood, V., de la Cruz, N., Tonellato, P., Jaiswal, P., Seigfried, T. and White, R. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 32(1), 258-261.
9. Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P. and Bairoch, A. (2004) Recent improvements to the PROSITE database. Nucleic Acids Research 32(1), 134-137.
10. Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P. and Bork, P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Research 32(1), 142-144.
11. Madera, M., Vogel, C., Kummerfeld, S.K., Chothia, C. and Gough, J. (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Research 32(1), 235-239.
12. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Barrell, D., Bateman, A., Binns, D., Biswas, M., Bradley, P., Bork, P., Bucher, P., Copley, R.R., Courcelle, E., Das, U., Durbin, R., Falquet, L., Fleischmann, W., Griffiths-Jones, S., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lopez, R., Letunic, I., Lonsdale, D., Silventoinen, V., Orchard, S.E., Pagni, M., Peyruc, D., Ponting, C.P., Selengut, J.D., Servant, F., Sigrist, C.J., Vaughan, R. and Zdobnov, E.M. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Research 31(1), 315-318.
13. Orengo, C.A., Pearl, F.M. and Thornton, J.M. (2003) The CATH domain structure database. Methods in Biochemical Analysis 44, 249-271.
14. Pearl, F.M., Lee, D., Bray, J.E., Buchan, D.W., Shepherd, A.J. and Orengo, C.A. (2002) The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Science 11(2), 233-244.
15. Servant, F., Bru, C., Carrère, S., Courcelle, E., Gouzy, J., Peyruc, D. and Kahn, D. (2002) ProDom: Automated clustering of homologous domains. Briefings in Bioinformatics 3, 246-25.
16. Wu, C.H., Nikolskaya, A., Huang, H., Yeh, L.S., Natale, D.A., Vinayaka, C.R., Hu, Z.Z., Mazumder, R., Kumar, S., Kourtesis, P., Ledley, R.S., Suzek, B.E., Arminski, L., Chen, Y., Zhang, J., Cardenas, J.L., Chung, S., Castro-Alvear, J., Dinkov, G., Barker, W.C. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Research 32(1), 112-114.
17. Zdobnov, E.M., Apweiler, R. (2001) InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9), 847-848.


Go to the abstract in the NAR 2007 Database Issue.
Oxford University Press is not responsible for the content of external internet sites