Ensemble Regression Modelling for Genetic Network Inference

Hasini Nakulugamuwa Gamage,Adrian Shatte,Jennifer Hallinan,Madhu Chetty

doi:10.1109/cibcb55180.2022.9863017

Abstract

An accurate reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is crucial for discovering complex biological interactions. Among many different approaches for inferring GRNs, there are several methods which produce high false positive interactions, and are unstable, requiring fine tuning for many of their parameters. In this paper, we consider the GRN inference problem as a regression problem, and propose a simple ensemble regression-based feature selection model which is a combination of cross-validated Lasso and cross-validated Ridge algorithms for reconstructing GRNs. Due to the novelty of the proposed ensemble model, it is able to eliminate overfitting, multi co-linearity issues, and irrelevant genes within one computational approach. While observing the type of gene-gene regulatory interactions the regression model also identifies the direction of these interactions. A new coefficient of determination (R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> )-based approach identifies the best model to fit the data among LassoCV and RidgeCV, and evaluates the model importance in term of gene-wise maximum in-degree which decides the maximum number of regulatory genes including self-regulations that can be selected from a given method. Then, an evaluated gene score-based majority voting technique aggregates the selected gene lists from each method. In our experiments, the performance of the proposed ensemble approach was evaluated using gene expression datasets from three small-scale real gene networks. Our proposed model outperformed other state-of-the-art methods, producing high true positives, reducing false positives, and obtaining high Structural Accuracy, while maintaining model stability and efficiency.

Full Text