Gene Selection from Biological Data via Group Lasso for Logistic Regression Model: Effects of Different Clustering Algorithms

Shunjie Chen,Pei Wang

doi:10.23919/ccc52363.2021.9549471

Abstract

With the accumulation of various high-throughput biological data, an urgent task is to develop efficient methods to explore useful bioinformatics from massive data. Various penalized logistic regression models have been widely used to perform sample classification and gene selection. Typical penalization methods include Lasso, adaptive Lasso, group Lasso and so on. Group Lasso can well identify closely related informative gene groups, which has better performance when there are high correlation structures in the data. However, the group Lasso relies on efficient data clustering algorithms and similarity measures. The frequently used k-means clustering algorithm can classify data into k groups, however, the final results are unstable, which severely relies on initial setting and similarity measures. In this paper, we introduce k-shape into the group Lasso for logistic regression, and we compare the effect of different clustering algorithms both in simulated and real-world biological data. We find that the group Lasso with the k-shape algorithm has good performance, which is comparatively more precision and stable than those with the traditional k-means algorithms. The associated investigations clarify some effects of clustering algorithms in the group Lasso for the logistic regression model, which have important applications in informative gene selection from high-throughput biological data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Gene Selection from Biological Data via Group Lasso for Logistic Regression Model: Effects of Different Clustering Algorithms

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Towards automated derivation of biological pathways using high-throughput biological data
Yu Chen ... Trupti Joshi
-
Yu Chen, et. al. Yu Chen ... Trupti Joshi
10 Mar 2003
10 Mar 2003

An empirical study of supervised learning for biological sequence profiling and microarray expression data analysis
Abu H M Kamal ... Xingquan Zhu
-
Abu H M Kamal, et. al.Abu H M Kamal ... Xingquan Zhu
01 Jan 2008
01 Jan 2008

Defining transcription modules using large-scale gene expression data.
Jan Ihmels ... Naama Barkai
Bioinformatics | VOL. 20
Jan Ihmels, et. al.Jan Ihmels ... Naama Barkai
25 Mar 2004
Bioinformatics | VOL. 20

A new suffix tree similarity measure for document clustering
Hung Chim ... Xiaotie Deng
-
Hung Chim, et. al.Hung Chim ... Xiaotie Deng
08 May 2007
08 May 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gene Selection from Biological Data via Group Lasso for Logistic Regression Model: Effects of Different Clustering Algorithms

Abstract

Talk to us

Similar Papers