Biclustering Models Under Collinearity in Simulated Biological Experiments

Chibuike Nnamani,Norhaiza Ahmad

doi:10.11113/matematika.v39.n3.1461

Abstract

Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix. Such methods have been applied in biological data for classification. Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways. Such relationships could seriously reduce the efficiency of biclustering models. In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models. Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data. The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared. The results show that all the models investigated are sensitive to changes in the level of collinearity. At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly. As the level of collinearity in the data rise, the proportion of detected biclusters captured by the models reduces. In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.499 to 0.875 and 0.746 to 0.936 for one and two implanted biclusters respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Biclustering Models Under Collinearity in Simulated Biological Experiments

Abstract

Talk to us

Similar Papers

More From: MATEMATIKA

Lead the way for us

Similar Papers

Synthetic Data Generation By Artificial Intelligence to Accelerate Translational Research and Precision Medicine in Hematological Malignancies
Saverio D'Amico ...
Blood | VOL. 140
Saverio D'Amico, et. al.Saverio D'Amico ...
15 Nov 2022
Blood | VOL. 140

Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project.
Nicholas I-Hsien Kuo ... Sanja Lujic
JMIR medical education | VOL. 10
Nicholas I-Hsien Kuo, et. al.Nicholas I-Hsien Kuo ... Sanja Lujic
16 Jan 2024
JMIR medical education | VOL. 10

δ-clusters: capturing subspace correlation in a large data set
Jiong Yang ... P Yu
-
Jiong Yang, et. al. Jiong Yang ... P Yu
07 Aug 2002
07 Aug 2002

Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.
Anmol Arora ... Ananya Arora
PloS one | VOL. 18
Anmol Arora, et. al.Anmol Arora ... Ananya Arora
16 Mar 2023
PloS one | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Biclustering Models Under Collinearity in Simulated Biological Experiments

Abstract

Talk to us

Similar Papers

More From: MATEMATIKA