GENE FAMILY IDENTIFICATION NETWORK DESIGN FOR PROTEIN SEQUENCE ANALYSIS

Cathy H Wu,Jerry Mclarty,Hongzhan Huang

doi:10.1142/s0218213099000282

Abstract

With the exponential accumulation of sequence data, continued progress in the Human Genome Project will depend increasingly on advanced computational tools to manage and analyze the data. Utilizing information embedded within families of homologous sequences, a gene family identification approach may facilitate the understanding of gene functions. We have developed a GeneFIND (Gene Family Identification Network Design) system for database searching against gene families. It provides rapid and accurate protein family identification by combining global and motif sequence similarities and incorporating ProClass family information. Multi-level filters are used, starting with the MOTIFIND neural networks and BLAST search, followed by SSEARCH alignment, motif pattern match, hidden Markov modeling of motifs and ClustalW motif alignment. GeneFIND has been implemented as a full-scale system for the classification of more than 1200 ProSite and 6000 PIR families. It has been used to identify thousands of new family members and is well suited for genomic sequence analysis. The system is available for on-line family identification from our WWW server ().

Full Text