Abstract

Binary matrix factorization is an essential tool for identifying discrete patterns in binary data. In this paper, we consider the rank-k binary matrix factorization problem (k-BMF) under Boolean arithmetic: we are given an n × m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n × k and k × m, respectively, which minimize the distance between X and the Boolean product of A and B in the squared Frobenius distance. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak linear programming (LP) relaxation, whereas the exponential size IPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times the zero is erroneously covered in a rank-k factorization. For one of the exponential size Ips, we describe a computational approach based on column generation. Experimental results on synthetic and real-world data sets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorizations. Funding: O. Günlük was partially supported by the Office of Naval Research [Grant N00014-21-1-2575]. R. Á. Kovács was supported by a doctoral scholarship from The Alan Turing Institute under the EPSRC [Grant EP/N510129/1] and the Office for National Statistics. R. A. Hauser was supported by The Alan Turing Institute under the EPSRC [Grant EP/N510129/1].

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call