Abstract
Every day, humans generate approximately 2.5 billion gigabytes of new data. While this data is crucial in applications such as drug development and cybersecurity, it is mostly unstructured and unlabeled which makes it difficult to process effectively. Businesses and governments alike solve this problem by using machine learning algorithms to characterize the data and make decisions with it. However, conventional machine learning processes consume excessive computational resources and time: one of the most prominent, K-means clustering, has an approximate time complexity of O(n^2), where n is the input data size. This quadratic time complexity causes scalability problems with large datasets. In this project, I have created an improved, more scalable clustering algorithm by leveraging quantum computation. The approach (new Q-means) involves an Angle Embedding Feature Map, an d-dimensional SWAP Test, and a dynamic, threshold distance-based termination criteria. After implementing this algorithm using the IBM Qiskit SDK, the new Q-means algorithm was run several times on different dimensional datasets. On an average, the new Q-means performed 18% better than the K-means algorithm based on the Adjusted Rand Index. In terms of the simulation execution time, the new Q-means was about six times faster than the existing Q-means due to a multi-threaded implementation and faster convergence. The time complexity of the new Q-means without a Quantum Random Access Memory (QRAM) is O(n^2). While the existing Q-means algorithm, which utilizes QRAM, has a time complexity of O(n(log n)^p), the time complexity of the new Q-means with QRAM is significantly better—O(n).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.