Abstract
Coclustering algorithms are an alternative to classic one-sided clustering algorithms. Because of its ability to simultaneously cluster rows and columns of a dyadic data matrix, coclustering offers a higher value-added information: it offers column clusters besides row clusters, and the relationship between them in terms of coclusters. Different structures of coclusters are possible, and those that overlap in terms of rows or columns still represent an open question with room for improvements. In addition, while most related literature cites coclustering as a means of producing better results from one-side clustering, few initiatives study it as a tool capable of providing higher quality descriptive information about this clustering. In this paper, we present a new coclustering algorithm - OvNMTF, based on triple matrix factorization, which properly handle overlapped coclusters, by adding degrees of freedom for matrix factorization that enable the discovery of specialized column clusters for each row cluster. As a proof of concept, we modeled text analysis as a coclustering problem with column overlaps, assuming that given words (data matrix columns) are associated with over one document cluster (row cluster) because they can assume different semantic relationships in each association. Experiments on synthetic data sets show the OvNMTF algorithm reasonableness; experiments on real-world text data show its power for extracting high quality information.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.