Skip Navigation

SUPFAM


NAR Molecular Biology Database Collection entry number 219
Krishnadev O., Bhaskara R.M., Agarwal G., and Srinivasan N.
Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India.

Database Description

Members of a superfamily of proteins could result from divergent evolution of homologues with very low similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional (3-D) structures of the proteins concerned are determined using X-ray analysis or NMR. The SUPFAM (1,2) database described here relates two or more homologous protein families, in a multiple sequence alignment database, of either known or unknown structure. The present SUPFAM update (2.3) has been derived using Pfam (3) (version 23.0), a database of sequence domains, and PALI (4,5) (release 2.7) which is an alignment database of homologous proteins of known structure that is derived largely from SCOP (6). The first step in establishing SUPFAM is to relate Pfam families with the families in PALI. The second step involves relating Pfam families that could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, RPS_BLAST has been used in these steps. In the present update, the first step enabled association of 3120 Pfam families (out of 10334, ~30%) with SCOP/PALI family. We could associate 802 Pfam families with no structure annotation according to Pfam to a PALI family in this step. In the second step, using the profiles of 7214 Pfam families with apparently no structural information, an all-against-all comparison involving sequence-profile match resulted in clustering of 178 homologous protein families of Pfam in to 76 new potential superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space.

Recent Developments

The Pfam and PALI releases used in the current update of SUPFAM correspond to much larger databases compared to the last update. The present updated revision of SUPFAM also resulted in grouping of 178 Pfam families into 76 new potential superfamilies. Multiple profiles [7] have been generated for every Pfam and PALI family using PSI-BLAST and used in identifying relationships in order to reduce the bias of reference sequence in RPS-BLAST profile generation. DUF/UPF connections to other Pfam/PALI families have been higlighted in the website.

Acknowledgements

OK and GA are supported by fellowships from CSIR, New Delhi. This work is supported by the Department of Biotechnology, New Delhi.

References

1. Pandit, S.B., Gosar, D., Abhiman, S.,Sujatha, S., Dixit, S.S., Mhatre, N.S., Sowdhamini, R. and Srinivasan, N. (2002) SUPFAM - Database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: Implications for structural genomics and function annotation in genomes. Nucleic Acids Res., 30, 289-293.
2. Pandit, S.B., Bhadra, R., Gowri, V.S., Balaji, S., Anand, B., and Srinivasan, N. (2004). SUPFAM: A database of sequence superfamilies of protein domains. BMC Bioinformatics, 5, 28.
3. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L.L. (2000) PFAM protein families database. Nucleic Acids Res., 28, 263-266.
4. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
5. Gowri, V. S., Pandit, S. B., Karthik, P. S. Srinivasan, N., and Balaji, S. (2003). Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database. Nucleic Acids Res. 2003 31, 486-488.
6. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 248, 536-540.
7. Anand, B., Gowri, V.S., and Srinivasan, N. (2005) Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 21, 2821-2826.


Go to the abstract in the NAR 2002 Database Issue.
Oxford University Press is not responsible for the content of external internet sites