Abstract

Many fundamental problems in data mining can be reduced to one or more NP-hard combinatorial optimization problems. Recent advances in novel technologies such as quantum and quantum-inspired hardware promise a substantial speedup for solving these problems compared to when using general purpose computers but often require the problem to be modeled in a special form, such as an Ising or quadratic unconstrained binary optimization (QUBO) model, in order to take advantage of these devices. In this work, we focus on the important binary matrix factorization (BMF) problem which has many applications in data mining. We propose two QUBO formulations for BMF. We show how clustering constraints can easily be incorporated into these formulations. The special purpose hardware we consider is limited in the number of variables it can handle which presents a challenge when factorizing large matrices. We propose a sampling based approach to overcome this challenge, allowing us to factorize large rectangular matrices. In addition to these methods, we also propose a simple baseline algorithm which outperforms our more sophisticated methods in a few situations. We run experiments on the Fujitsu Digital Annealer, a quantum-inspired complementary metal-oxide-semiconductor (CMOS) annealer, on both synthetic and real data, including gene expression data. These experiments show that our approach is able to produce more accurate BMFs than competing methods.

Highlights

  • Many fundamental problems in data mining consist of discrete decision making and are combinatorial in nature

  • In this paper we show how the aforementioned hardware technologies, via the quadratic unconstrained binary optimization (QUBO) framework, can be used for binary matrix factorization

  • The number of binary variables for the QUBO problem in Digital Annealer (DA) binary matrix factorization (BMF) is similar across all experiments

Read more

Summary

Introduction

Many fundamental problems in data mining consist of discrete decision making and are combinatorial in nature. Data categorization, class assignment, identification of outlier instances, k-means clustering, combinatorial extensions of support vector machines, and consistent biclustering, to mention a few [1]. In many cases, these underlying problems are NP-hard, and approaches to solving them dependent on heuristics. Researchers have been exploring different computing paradigms to tackle these NP-hard problems, including quantum computing and the development of dedicated special purpose hardware. The Ising and quadratic unconstrained binary optimization (QUBO).

Objectives
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.