SUPFAM
NAR Molecular Biology Database Collection entry number 219
Krishnadev O., Swapna L.S., Gowri V.S., Agarwal G., Srinivasan N., and Pandit, S.B.
Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
Contact ns@mbu.iisc.ernet.in
Database Description
Members of a superfamily of proteins could result from divergent evolution of homologues with very low similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional (3-D) structures of the proteins concerned are determined using X-ray analysis or NMR. The SUPFAM (1,2) database relates two or more homologous protein families, in a multiple sequence alignment database, of either known or unknown structure. The present SUPFAM update (2.1) has been derived using Pfam version 22.0 and release 2.6 of PALI (3,4) which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI. The second step involves relating Pfam families that could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, RPS_BLAST has been used in these steps. In the present update, the first step enabled association of 3162 Pfam families (out of 9318 ~ 33%) with SCOP/PALI family. We could associate 727 Pfam families with no structure annotation according to Pfam to a PALI family in this step. In the second step, using the profiles of 6156 Pfam families with apparently no structural information, an all-against-all comparison involving sequence-profile match resulted in clustering of 157 homologous protein families of Pfam in to 65 new potential superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam/.
Recent Developments
The Pfam and PALI releases used in the current update of SUPFAM correspond to much larger databases compared to the last update. The present updated revision of SUPFAM also resulted in grouping of 157 Pfam families into 65 new potential superfamilies. Multiple profiles [5] have been generated for every Pfam and PALI family using PSI-BLAST and used in identifying relationships in order to reduce the bias of reference sequence in RPS-BLAST profile generation.
Acknowledgements
OK, LSS and GA are supported by fellowships from CSIR, New Delhi. This work is supported by the Department of Biotechnology, New Delhi.
References
1. Pandit, S.B., Gosar, D., Abhiman, S., Sujatha, S., Dixit, S.S., Mhatre, N.S., Sowdhamini, R. and Srinivasan, N. (2002) SUPFAM - Database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: Implications for structural genomics and function annotation in genomes. Nucleic Acids Res., 30, 289-293.
2. Pandit, S.B., Bhadra, R., Gowri, V.S., Balaji, S., Anand,B., Srinivasan, N. (2004). SUPFAM: A database of sequence superfamilies of protein domains. BMC Bioinformatics, 5, 28
3. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
4. Gowri, V. S., Pandit, S. B., Karthik, P. S. Srinivasan, N., Balaji, S. (2003). Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database. Nucleic Acids Res. 2003 31, 486-488.
5. Anand, B., Gowri, V.S., Srinivasan, N., (2005) Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 21, 2821-2826.
2. Pandit, S.B., Bhadra, R., Gowri, V.S., Balaji, S., Anand,B., Srinivasan, N. (2004). SUPFAM: A database of sequence superfamilies of protein domains. BMC Bioinformatics, 5, 28
3. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
4. Gowri, V. S., Pandit, S. B., Karthik, P. S. Srinivasan, N., Balaji, S. (2003). Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database. Nucleic Acids Res. 2003 31, 486-488.
5. Anand, B., Gowri, V.S., Srinivasan, N., (2005) Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 21, 2821-2826.
Category: Protein sequence databases
Subcategory: Protein domain databases; protein classification
Go to the abstract in the NAR 2002 Database Issue.
Oxford University Press is not responsible for the content of external internet sites