Abstract
BackgroundA new method for the prediction of protein structural classes is constructed based on Rough Sets algorithm, which is a rule-based data mining method. Amino acid compositions and 8 physicochemical properties data are used as conditional attributes for the construction of decision system. After reducing the decision system, decision rules are generated, which can be used to classify new objects.ResultsIn this study, self-consistency and jackknife tests on the datasets constructed by G.P. Zhou (Journal of Protein Chemistry, 1998, 17: 729–738) are used to verify the performance of this method, and are compared with some of prior works. The results showed that the rough sets approach is very promising and may play a complementary role to the existing powerful approaches, such as the component-coupled, neural network, SVM, and LogitBoost approaches.ConclusionThe results with high success rates indicate that the rough sets approach as proposed in this paper might hold a high potential to become a useful tool in bioinformatics.
Highlights
A new method for the prediction of protein structural classes is constructed based on Rough Sets algorithm, which is a rule-based data mining method
In order to verify the performance of this rough sets based method, we carried out self-consistency test and cross-validation based on jackknife test to evaluate the prediction results
The results indicated that Rough Sets captured the characteristics between sequences and their classes through amino acid composition and physicochemical properties
Summary
A new method for the prediction of protein structural classes is constructed based on Rough Sets algorithm, which is a rule-based data mining method. A review about prediction of protein structural class and subcellular locations by Chou [1] presented this problem systematically, and introduced and compared some existing methods. A new weighting method [4] was proposed to predict protein structural classes from amino acid composition in 1992. After that, another new method, called maximum component coefficient method, was proposed by Zhang and Chou [5], which had a higher correct rate than other methods. A new neural networks based algorithm [6] was developed that considers six hydrophobic amino acid patterns together with amino acid compositions, and a cross-validation test was used to verify the (page number not for citation purposes)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.