Abstract

In order to apply a powerful pattern recognition algorithm to predict functional sites in proteins, amino acids cannot be used directly as inputs since they are non-numerical variables. Therefore, they need encoding prior to input. In this regard, the bio-basis function maps a non-numerical sequence space to a numerical feature space. One of the important issues for the bio-basis function is how to select a minimum set of bio-basis strings with maximum information. In this paper, an efficient method to select bio-basis strings for the bio-basis function is described integrating the concepts of the Fisher ratio and “degree of resemblance”. The integration enables the method to select a minimum set of most informative bio-basis strings. The “degree of resemblance” enables efficient selection of a set of distinct bio-basis strings. In effect, it reduces the redundant features in numerical feature space. Quantitative indices are proposed for evaluating the quality of selected bio-basis strings. The effectiveness of the proposed bio-basis string selection method, along with a comparison with existing methods, is demonstrated on different data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.