Abstract

Similarity matrices are 2D representations of the degree of similarity between points of a given dataset which are employed in different fields such as data mining, genetics or machine learning. However, their calculation presents quadratic complexity and, thus, it is specially expensive for large datasets. MPICorMat is able to accelerate the construction of these matrices through the use of a hybrid parallelization strategy based on MPI and OpenMP. The previous version of this tool achieved high performance and scalability, but it only implemented one single similarity metric, the Pearson’s correlation. Therefore, it was suitable only for those problems where data are normally distributed and there is a linear relationship between variables. In this work, we present an extension to MPICorMat that incorporates eight additional metrics for similarity so that the users can choose the one that best adapts to their problem. The performance and energy consumption of each metric is measured in two platforms: a multi-core platform with two Intel Xeon Sandy-Bridge processors and a many-core Intel Xeon Phi KNL. Results show that MPICorMat executes faster and consumes less energy on the many-core architecture. The new version of MPICorMat is publicly available to download from its website: https://sourceforge.net/projects/mpicormat/

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.