Abstract

BackgroundMany biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance.MethodsIn this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features.ResultsWe tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression.ConclusionIt demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied.

Highlights

  • Many biology related research works combine data from multiple sources in an effort to understand the underlying problems

  • Fewer features involved in the model will make it less complex and approach continues until the selected subset of features does not change

  • We describe our approach by showing a mapping of the forest generated by Random Forests (RF) to rule space where many of rules are being removed by 1-norm regularization

Read more

Summary

Introduction

Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. It is vital to have an interpretable model (e.g., relevant features and predictive rules) and high performance prediction at the same time to understand the underlying problem well. Decision rule based algorithms are well known for their capability of shedding light on the decision process in addition to making a prediction. Another factor affecting the interpretation of model generated from data is feature selection. In addition to application to classification case, we apply this iterative approach to another category of learning algorithm - regression rule learning, extending its domain of usage

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.