Abstract

With the accumulation of various high-throughput biological data, an urgent task is to develop efficient methods to explore useful bioinformatics from massive data. Various penalized logistic regression models have been widely used to perform sample classification and gene selection. Typical penalization methods include Lasso, adaptive Lasso, group Lasso and so on. Group Lasso can well identify closely related informative gene groups, which has better performance when there are high correlation structures in the data. However, the group Lasso relies on efficient data clustering algorithms and similarity measures. The frequently used k-means clustering algorithm can classify data into k groups, however, the final results are unstable, which severely relies on initial setting and similarity measures. In this paper, we introduce k-shape into the group Lasso for logistic regression, and we compare the effect of different clustering algorithms both in simulated and real-world biological data. We find that the group Lasso with the k-shape algorithm has good performance, which is comparatively more precision and stable than those with the traditional k-means algorithms. The associated investigations clarify some effects of clustering algorithms in the group Lasso for the logistic regression model, which have important applications in informative gene selection from high-throughput biological data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.