Scalable Feature Matching Across Large Data Collections

David Degras

doi:10.1080/10618600.2022.2074429

Abstract

This article is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop fast algorithms with time complexity roughly linear in the number n of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce matching problems between pairs of datasets to n problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization performances. An application of feature matching to a large neuroimaging database is presented. The algorithms of this article are implemented in the R package matchFeat available at github.com/ddegras/matchFeat. Supplementary materials for this article are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational and Graphical Statistics	Publication Date: Jun 1, 2022
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Scalable Feature Matching Across Large Data Collections

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics

Lead the way for us

Similar Papers

Outperforming Several Heuristics for the Multidimensional Assignment Problem
Carlos E Valencia ... Marcos Cesar Vargas Magana
-
Carlos E Valencia, et. al.Carlos E Valencia ... Marcos Cesar Vargas Magana
01 Sep 2018
01 Sep 2018

G-Aligner: a graph-based feature alignment method for untargeted LC–MS-based metabolomics
Ruimin Wang ... Changbin Yu
BMC Bioinformatics | VOL. 24
Ruimin Wang, et. al.Ruimin Wang ... Changbin Yu
14 Nov 2023
BMC Bioinformatics | VOL. 24

Target Tracking on Computational Grids
Eduardo Pasiliao
-
Eduardo PasiliaoEduardo Pasiliao
05 Jan 2009
05 Jan 2009

On the number of local minima for the multidimensional assignment problem
Don A Grundel ... Pavlo A Krokhmal
Journal of Combinatorial Optimization | VOL. 13
Don A Grundel, et. al.Don A Grundel ... Pavlo A Krokhmal
13 Oct 2006
Journal of Combinatorial Optimization | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable Feature Matching Across Large Data Collections

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics