Learning accurate and interpretable models based on regularized random forests regression.

Sheng Liu,Xin Dang,Shamitha Dissanayake,Todd Mlsna,Sanjay Patel,Yixin Chen,Dawn Wilkins

doi:10.1186/1752-0509-8-s3-s5

Abstract

BackgroundMany biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance.MethodsIn this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features.ResultsWe tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression.ConclusionIt demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied.

Highlights

Many biology related research works combine data from multiple sources in an effort to understand the underlying problems
Fewer features involved in the model will make it less complex and approach continues until the selected subset of features does not change
We describe our approach by showing a mapping of the forest generated by Random Forests (RF) to rule space where many of rules are being removed by 1-norm regularization

Summary

Introduction

Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. It is vital to have an interpretable model (e.g., relevant features and predictive rules) and high performance prediction at the same time to understand the underlying problem well. Decision rule based algorithms are well known for their capability of shedding light on the decision process in addition to making a prediction. Another factor affecting the interpretation of model generated from data is feature selection. In addition to application to classification case, we apply this iterative approach to another category of learning algorithm - regression rule learning, extending its domain of usage

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Systems Biology	Publication Date: Oct 22, 2014
Citations: 35	License type: cc-by

R Discovery Prime

R Discovery Prime

Learning accurate and interpretable models based on regularized random forests regression.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology

Lead the way for us

Similar Papers

Rule based regression and feature selection for biological data
Sheng Liu ... Dawn Wilkins
-
Sheng Liu, et. al.Sheng Liu ... Dawn Wilkins
01 Dec 2013
01 Dec 2013

Binary black hole algorithm for feature selection and classification on biological data
Elnaz Pashaei ... Nizamettin Aydin
Applied Soft Computing | VOL. 56
Elnaz Pashaei, et. al.Elnaz Pashaei ... Nizamettin Aydin
06 Mar 2017
Applied Soft Computing | VOL. 56

Curve Fitting: Fitting Functions to Agricultural and Biological Data
José Boaventura Cunha
-
José Boaventura Cunha José Boaventura Cunha
01 Jan 2006
01 Jan 2006

Lognormal Fitting of Particle Size Distribution Data Monitored in Animal Buildings: Linear versus Nonlinear Regression Approach
X Yang ... X Wang
Transactions of the ASABE | VOL. 55
X Yang, et. al. X Yang ... X Wang
01 Jan 2012
Transactions of the ASABE | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning accurate and interpretable models based on regularized random forests regression.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology