Abstract
With the exponential accumulation of sequence data, continued progress in the Human Genome Project will depend increasingly on advanced computational tools to manage and analyze the data. Utilizing information embedded within families of homologous sequences, a gene family identification approach may facilitate the understanding of gene functions. We have developed a GeneFIND (Gene Family Identification Network Design) system for database searching against gene families. It provides rapid and accurate protein family identification by combining global and motif sequence similarities and incorporating ProClass family information. Multi-level filters are used, starting with the MOTIFIND neural networks and BLAST search, followed by SSEARCH alignment, motif pattern match, hidden Markov modeling of motifs and ClustalW motif alignment. GeneFIND has been implemented as a full-scale system for the classification of more than 1200 ProSite and 6000 PIR families. It has been used to identify thousands of new family members and is well suited for genomic sequence analysis. The system is available for on-line family identification from our WWW server ().
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have