Abstract

Convex clustering has drawn recent attention because of its competitive performance and nice property to guarantee global optimality. However, convex clustering is infeasible due to its high computational cost for large-scale data sets. We propose a novel method to solve the L1 fusion convex clustering problem by dynamic programming. We develop the Convex clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution path efficiently. The proposed C-PAINT yields the exact solution while other general solvers for convex problems applied in the convex clustering depend on tuning parameters such as step size and threshold, and it usually takes many iterations to converge. Including a sorting process that almost takes no time in practice, the main part of the algorithm takes only linear time. Thus, C-PAINT has superior scalability comparing to other state-of-art algorithms. Moreover, C-PAINT enables the path visualization of clustering solutions for large data. In particular, experiments show our proposed method can solve the convex clustering with 10^7 data points in two minutes. We demonstrate the proposed method using both synthetic data and real data. Our algorithms are implemented in the dpcc R package.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.