Abstract

Developing technology renders the analysis of biological data highly effective by statistical and computational techniques. Inference of biological systems from this data is one of the promising outcomes as it is now crucial, especially, in personalized medicine. The mathematical description of bio logical networks can be made mainly by probabilistic and deterministic models. The former is based on the random nature of the measurements and can be grouped under parametric and nonparametric models. On the other hand, the latter accepts no random effect in the data and is represented under nonparametric models. In this study, we implement deterministic and probabilistic modeling of biological systems. Among many alternatives, initially, we use the random forest algorithm, which we suggest to study biological networks, and then we perform an extended version of the multivariate adaptive regression splines (MARS) model. Basically, MARS is a nonparametric statistical model that can successfully explain highly correlated data having nonlinear relations. From comparative studies, it has been shown that the inference of this model is significantly fast without losing accuracy. Hence, in this work, we extend the underlying lasso-based MARS model by considering the interaction effects of the systems’ elements. In the analysis, we compare the performance of the extended MARS with a popular probabilistic model, the so-called the Gaussian graphical model (GGM), to construct the biological networks. GGM uses the precision matrix to present the pairwise linear correlation between species under the assumption of the multivariate normally distributed states, i.e., the levels of concentrations of the system’s elements. Hereby, by suggesting both RF and extended MARS model, we have the flexibility to describe the biological data under both normal and non-normal distributions where GGM becomes insufficient. In our assessments, we implement all the underlying methods to simulated and real datasets and compare their accuracies. From the findings, we observe that our proposal approaches are promising as the alternates of GGM and may enable us to unravel the true structure of the biological systems to detect the direct or indirect relationships among genes/proteins and diseases, which can be considered as a key in improving personalized and preventive medicine.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call