Multimodal Data Mining Based on Self Attention Feature Alignment
This study explores the application of multimodal data mining in text and image vector processing, aiming to improve the depth and breadth of data analysis by integrating information from different data types. We use the FairFace dataset combined with the CLIP model encoding layer to obtain text and image vectors, and use the K-Means clustering algorithm to achieve vector dimensionality reduction. Subsequently, we introduced the bipartite graph matching algorithm to achieve maximum matching between text vectors and image vectors, and calculated the contrastive learning loss and similarity loss. The entire process covers steps such as data preparation, feature extraction, vector dimensionality reduction, matching algorithms, and loss assessment, constructing a complete text and image matching task process. Our research contributions include using K-Means clustering algorithm to achieve vector dimensionality reduction, as well as introducing bipartite graph matching algorithm and calculating two types of losses in text and image vector matching, further improving matching quality.
- Research Article
13
- 10.1145/3379552
- Mar 13, 2020
- ACM Journal of Experimental Algorithmics
We perform an experimental study of algorithms for online bipartite matching under the known i.i.d input model with integral types. In the last decade, there has been substantial effort in designing complex algorithms to improve worst-case approximation ratios. Our goal is to determine how these algorithms perform on more practical instances rather than worst-case instances. In particular, we are interested in whether the ranking of the algorithms by their worst-case performance is consistent with the ranking of the algorithms by their average-case/practical performance. We are also interested in whether preprocessing times and implementation difficulties that are introduced by these algorithms are justified in practice. To that end, we evaluate these algorithms on different random inputs as well as real-life instances obtained from publicly available repositories. We compare these algorithms against several simple greedy-style algorithms. Most of the complex algorithms in the literature are presented as being non-greedy (i.e., an algorithm can intentionally skip matching a node that has available neighbors) to simplify the analysis. Every such algorithm can be turned into a greedy one without hurting its worst-case performance. On our benchmarks, non-greedy versions of these algorithms perform much worse than their greedy versions. Greedy versions perform about as well as the simplest greedy algorithm by itself. This, together with our other findings, suggests that simplest greedy algorithms are competitive with the state-of-the-art worst-case algorithms for online bipartite matching on many average-case and practical input families. Greediness is by far the most important property of online algorithms for bipartite matching.
- Conference Article
50
- 10.1145/62212.62263
- Jan 1, 1988
We present algorithms for matching and related problems that run on an EREW PRAM with p processors. Given is a bipartite graph G with n vertices, m edges, and integral edge costs at most N in magnitude. We give an algorithm for the assignment problem (minimum cost perfect bipartite matching) that runs in O(√nm log (nN)(log(2p))/p) time and O(m) space, for p ≤ m/(√nlog2n). For p = 1 this improves the best known sequential algorithm, and is within a factor of log (nN) of the best known bound for the problem without costs (maximum cardinality matching). For p > 1 the time is within a factor of log p of optimum speed-up. Extensions include an algorithm for maximum cardinality bipartite matching with slightly better processor bounds, and similar results for bipartite degree-constrained subgraph problems (with and without costs). Our ideas also extend to general graph matching problems.
- Research Article
23
- 10.1186/1756-0500-6-35
- Jan 31, 2013
- BMC Research Notes
BackgroundGlobal network alignment has been proposed as an effective tool for computing functional orthology. Commonly used global alignment techniques such as IsoRank rely on a two-step process: the first step is an iterative diffusion-based approach for assigning similarity scores to all possible node pairs (matchings); the second step applies a maximum-weight bipartite matching algorithm to this similarity score matrix to identify orthologous node pairs. While demonstrably successful in identifying orthologies beyond those based on sequences, this two-step process is computationally expensive. Recent work on computation of node-pair similarity matrices has demonstrated that the computational cost of the first step can be significantly reduced. The use of these accelerated methods renders the bipartite matching step as the dominant computational cost. This motivates a critical assessment of the tradeoffs of computational cost and solution quality (matching quality, topological matches, and biological significance) associated with the bipartite matching step. In this paper we utilize the state-of-the-art core diffusion-based step in IsoRank for similarity matrix computation, and couple it with two heuristic bipartite matching algorithms – a matrix-based greedy approach, and a tunable, adaptive, auction-based matching algorithm developed by us. We then compare our implementations against the performance and quality characteristics of the solution produced by the reference IsoRank binary, which also implements an optimal matching algorithm.ResultsUsing heuristic matching algorithms in the IsoRank pipeline exhibits dramatic speedup improvements; typically ×30 times faster for the total alignment process in most cases of interest. More surprisingly, these improvements in compute times are typically accompanied by better or comparable topological and biological quality for the network alignments generated. These measures are quantified by the number of conserved edges in the alignment graph, the percentage of enriched components, and the total number of covered Gene Ontology (GO) terms.ConclusionsWe have demonstrated significant reductions in global network alignment computation times by coupling heuristic bipartite matching methods with the similarity scoring step of the IsoRank procedure. Our heuristic matching techniques maintain comparable – if not better – quality in resulting alignments. A consequence of our work is that network-alignment based orthologies can be computed within minutes (as compared to hours) on typical protein interaction networks, enabling a more comprehensive tuning of alignment parameters for refined orthologies.
- Research Article
11
- 10.1002/rsa.20578
- Dec 23, 2014
- Random Structures & Algorithms
In this paper we analyze the expected time complexity of the auction algorithm for the matching problem on random bipartite graphs. We first prove that if for every non-maximum matching on graph G there exist an augmenting path with a length of at most 2l + 1 then the auction algorithm converges after N ⋅ l iterations at most. Then, we prove that the expected time complexity of the auction algorithm for bipartite matching on random graphs with edge probability and c > 1 is w.h.p. This time complexity is equal to other augmenting path algorithms such as the HK algorithm. Furthermore, we show that the algorithm can be implemented on parallel machines with processors and shared memory with an expected time complexity of . © 2014 Wiley Periodicals, Inc. Random Struct. Alg., 48, 384–395, 2016
- Research Article
9
- 10.1109/tvlsi.2016.2530898
- Sep 1, 2016
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High defect density and extreme parameter variation make it very difficult to implement reliable logic functions in crossbar-based nanoarchitectures. It is a major design challenge to tolerate defects and variations simultaneously for such architectures. In this paper, a method based on a bipartite matching and memetic algorithm is proposed for defect- and variation-tolerant logic mapping (D/VTLM) problem in crossbar-based nanoarchitectures. In the proposed method, the search space of the D/VTLM problem can be dramatically reduced through the introduction of the min–max weight maximum-bipartite-matching (MMW-MBM) and a related heuristic bipartite matching method. MMW-MBM is defined on a weighted bipartite graph as an MBM, where the maximal weight of the edges in the matching has a minimal value. In addition, a defect- and variation-aware local search (D/VALS) operator is proposed for D/VTLM and embedded in a global search framework. The D/VALS operator is able to utilize the domain knowledge extracted from problem instances and, thus, has the potential to search the solution space more efficiently. Compared with the state-of-the-art heuristic and recursive algorithms, and a simulated annealing algorithm, the good performance of our proposed method is verified on a 3-bit adder and a large set of random benchmarks of various scales.
- Research Article
38
- 10.1145/1671970.1712656
- Mar 1, 2010
- ACM Journal of Experimental Algorithmics
It is a well-established result that improved pivoting in linear solvers can be achieved by computing a bipartite matching between matrix entries and positions on the main diagonal. With the availability of increasingly faster linear solvers, the speed of bipartite matching computations must keep up to avoid slowing down the main computation. Fast algorithms for bipartite matching, which are usually initialized with simple heuristics, have been known for a long time. However, the performance of these algorithms is largely dependent on the quality of the heuristic. We compare combinations of several known heuristics and exact algorithms to find fast combined methods, using real-world matrices as well as randomly generated instances. In addition, we present a new heuristic aimed at obtaining high-quality matchings and compare its impact on bipartite matching algorithms with that of other heuristics. The experiments suggest that its performance compares favorably to the best-known heuristics, and that it is especially suited for application in linear solvers.
- Research Article
14
- 10.1016/j.parco.2014.03.004
- Mar 20, 2014
- Parallel Computing
On parallel push–relabel based algorithms for bipartite maximum matching
- Conference Article
39
- 10.1109/focs46700.2020.00046
- Nov 1, 2020
Online bipartite matching is one of the most fundamental problems in the online algorithms literature. Karp, Vazirani, and Vazirani (STOC 1990) introduced an elegant algorithm for the unweighted bipartite matching that achieves an optimal competitive ratio of $1-{{}^{1}}/_{e}$ . Aggarwal et al. (SODA 2011) later generalized their algorithm and analysis to the vertex-weighted case. Little is known, however, about the most general edge-weighted problem aside from the trivial 1/ 2 -competitive greedy algorithm. In this paper, we present the first online algorithm that breaks the long-standing 1/ 2 barrier and achieves a competitive ratio of at least 0.5086. In light of the hardness result of Kapralov, Post, and Vondrak (SODA 2013) that restricts beating a 1/ 2 competitive ratio for the more general problem of monotone submodular welfare maximization, our result can be seen as strong evidence that edge-weighted bipartite matching is strictly easier than submodular welfare maximization in the online setting. The main ingredient in our online matching algorithm is a novel subroutine called online correlated selection (OCS), which takes a sequence of pairs of vertices as input and selects one vertex from each pair. Instead of using a fresh random bit to choose a vertex from each pair, the OCS negatively correlates decisions across different pairs and provides a quantitative measure on the level of correlation. We believe our OCS technique is of independent interest and will find further applications in other online optimization problems.
- Research Article
189
- 10.1287/moor.2013.0621
- Aug 1, 2014
- Mathematics of Operations Research
We consider variants of the online stochastic bipartite matching problem motivated by Internet advertising display applications, as introduced in Feldman et al. [Feldman J, Mehta A, Mirrokni VS, Muthukrishnan S (2009) Online stochastic matching: Beating 1 − 1/e. FOCS '09: Proc. 50th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE, Washington, DC), 117–126]. In this setting, advertisers express specific interests into requests for impressions of different types. Advertisers are fixed and known in advance, whereas requests for impressions come online. The task is to assign each request to an interested advertiser (or to discard it) immediately upon its arrival. In the adversarial online model, the ranking algorithm of Karp et al. [Karp RM, Vazirani UV, Varirani VV (1990) An optimal algorithm for online bipartite matching. STOC '90: Proc. 22nd Annual ACM Sympos. Theory Comput. (ACM, New York), 352–358] provides a best possible randomized algorithm with competitive ratio 1 − 1/e ≈ 0.632. In the stochastic i.i.d. model, when requests are drawn repeatedly and independently from a known probability distribution over the different impression types, Feldman et al. [Feldman J, Mehta A, Mirrokni VS, Muthukrishnan S (2009) Online stochastic matching: Beating 1 − 1/e. FOCS '09: Proc. 50th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE, Washington, DC), 117–126] prove that one can do better than 1 − 1/e. Under the restriction that the expected number of request of each impression type is an integer, they provide a 0.670-competitive algorithm, later improved by Bahmani and Kapralov [Bahmani B, Kapralov M (2010) Improved bounds for online stochastic matching. ESA '10: Proc. 22nd Annual Eur. Sympos. Algorithms (Springer-Verlag, Berlin, Heidelberg), 170–181] to 0.699 and by Manshadi et al. [Manshadi V, Gharan SO, Saberi A (2012) Online stochastic matching: Online actions based on offline statistics. Math. Oper. Res. 37(4):559–573] to 0.705. Without this integrality restriction, Manshadi et al. are able to provide a 0.702-competitive algorithm. In this paper we consider a general class of online algorithms for the i.i.d. model that improve on all these bounds and that use computationally efficient offline procedures (based on the solution of simple linear programs of maximum flow types). Under the integrality restriction on the expected number of impression types, we get a 1 − 2e−2(≈0.729)-competitive algorithm. Without this restriction, we get a 0.706-competitive algorithm. Our techniques can also be applied to other related problems such as the online stochastic vertex-weighted bipartite matching problem as defined in Aggarwal et al. [Aggarwal G, Goel G, Karande C, Mehta A (2011) Online vertex-weighted bipartite matching and single-bid budgeted allocations. SODA '11: Proc. 22nd Annual ACM-SIAM Sympos. Discrete Algorithms (SIAM, Philadelphia), 1253–1264]. For this problem, we obtain a 0.725-competitive algorithm under the stochastic i.i.d. model with integral arrival rate. Finally, we show the validity of all our results under a Poisson arrival model, removing the need to assume that the total number of arrivals is fixed and known in advance, as is required for the analysis of the stochastic i.i.d. models described above.
- Research Article
22
- 10.1145/3556971
- Nov 17, 2022
- Journal of the ACM
Online bipartite matching is one of the most fundamental problems in the online algorithms literature. Karp, Vazirani, and Vazirani (STOC 1990) gave an elegant algorithm for unweighted bipartite matching that achieves an optimal competitive ratio of 1-1/e . Aggarwal et al. (SODA 2011) later generalized their algorithm and analysis to the vertex-weighted case. Little is known, however, about the most general edge-weighted problem aside from the trivial 1/2-competitive greedy algorithm. In this article, we present the first online algorithm that breaks the long-standing 1/2 barrier and achieves a competitive ratio of at least 0.5086. In light of the hardness result of Kapralov, Post, and Vondrák (SODA 2013), which restricts beating a 1/2 competitive ratio for the more general monotone submodular welfare maximization problem, our result can be seen as strong evidence that edge-weighted bipartite matching is strictly easier than submodular welfare maximization in an online setting. The main ingredient in our online matching algorithm is a novel subroutine called online correlated selection (OCS), which takes a sequence of pairs of vertices as input and selects one vertex from each pair. Instead of using a fresh random bit to choose a vertex from each pair, the OCS negatively correlates decisions across different pairs and provides a quantitative measure on the level of correlation. We believe our OCS technique is of independent interest and will find further applications in other online optimization problems.
- Research Article
3
- 10.1109/access.2024.3426652
- Jan 1, 2024
- IEEE Access
The domain of machine text translation and matching is undergoing substantial transformations amidst the perpetual evolution of deep learning methodologies. By amalgamating the contemporary realm of generative models and networks with the multi-faceted attentiveness of multiple heads, there has been a pronounced enhancement in the efficacy of existing text translation and matching endeavors. Consequently, this manuscript endeavors to elucidate the intricacies of the text-matching conundrum within the ambit of English translation. It posits a novel MA-Transformer text-matching framework that seamlessly integrates multi-tiered semantic feature extraction methodologies to actualize the text-matching task in the English translation process. The framework initiates its journey by employing Continuous Bag of Words (CBOW) for word vector embedding, thereby accomplishing the generation and embedding of word vectors. Subsequently, it expeditiously conducts the multilevel amalgamation of data features through the expeditious execution of the multi-head Transformer model. Following the culmination of feature fusion, a judicious sequence of data downgrading and feature screening ensues, ultimately culminating in the attainment of high-precision text matching. The experimental results show that the constructed MA Transformer model performs well in public and actual data testing, with an average precision of 0.867 and 0.722, respectively, on the two types of datasets. The accuracy of the text-matching is higher than that of the current common method frameworks, which provide technical support and references for the future construction of English translation systems.
- Conference Article
76
- 10.1109/focs46700.2020.00090
- Oct 23, 2020
We present an ~O(m+n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1.5</sup> )-time randomized algorithm for maximum cardinality bipartite matching and related problems (e.g. transshipment, negative-weight shortest paths, and optimal transport) on m-edge, n-node graphs. For maximum cardinality bipartite matching on moderately dense graphs, i.e. m=Ω(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1.5</sup> ), our algorithm runs in time nearly linear in the input size and constitutes the first improvement over the classic O(m√n)-time [Dinic 1970; Hopcroft-Karp 1971; Karzanov 1973] and ~O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ω</sup> )-time algorithms [Ibarra-Moran 1981] (where currently ω ≈ 2.373). On sparser graphs, i.e. when m=n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">9/8+δ</sup> for any constant , our result improves upon the recent advances of [Madry 2013] and [Liu-Sidford 2020b, 2020a] which achieve an ~O(m <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4/3+o(1)</sup> ) runtime. We obtain these results by combining and advancing recent lines of research in interior point methods (IPMs) and dynamic graph algorithms. First, we simplify and improve the IPM of [v.d.Brand-Lee-Sidford-Song 2020], providing a general primal-dual IPM framework and new sampling-based techniques for handling infeasibility induced by approximate linear system solvers. Second, we provide a simple sublinear-time algorithm for detecting and sampling high-energy edges in electric flows on expanders and show that when combined with recent advances in dynamic expander decompositions, this yields efficient data structures for maintaining the iterates of both [v.d.Brand et al.] and our new IPMs. Combining this general machinery yields a simpler ~O(n√m) time algorithm for matching based on the logarithmic barrier function, and our state-of-the-art ~O(m+n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1.5</sup> ) time algorithm for matching based on the [Lee-Sidford 2014] barrier (as regularized in [v.d.Brand et al.]).
- Research Article
3
- 10.1108/prog-07-2012-0037
- Jul 1, 2014
- Program
Purpose – The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case study of the name authority control problem in DLs. Design/methodology/approach – To find a sample of name variants across DLs (e.g. DBLP and ACM) and in a single DL (e.g. ACM), the approach is based on two bipartite matching algorithms: Maximum Weighted Bipartite Matching and Maximum Cardinality Bipartite Matching. Findings – First, the authors validated the effectiveness and efficiency of the bipartite matching algorithms. The authors also studied the nature of real cases of author name variants that had been found across DLs (e.g. ACM, CiteSeer and DBLP) and in a single DL. Originality/value – To the best of the authors knowledge, there is less research effort to understand the nature of author name variants shown in DLs. A thorough analysis can help focus research effort on real problems that arise when the authors perform duplicate detection methods.
- Research Article
21
- 10.1109/twc.2009.071351
- Mar 1, 2009
- IEEE Transactions on Wireless Communications
Based on the Hungarian algorithm, the Kuhn-Munkres algorithm can provide the maximum weight bipartite matching for assignment problems. However, it can only solve the single objective optimization problem. In this paper, we formulate the multi-objective optimization (MO) problem for bipartite matching, and propose a modified bipartite matching (MBM) algorithm to approach the Pareto set with a low computational complexity and to dynamically select proper solutions with given constraints among the reduced matching set. In addition, our MBM algorithm is extended to the case of asymmetric bipartite graphs. Finally, we illustrate the application of MBM to antenna assignments in wireless multiple-input multiple-output (MIMO) systems for both symmetric and asymmetric scenarios, where we consider the multi-objective optimization problem with the maximization of the system capacity, total traffic priority, and long-term fairness among all mobile users. The simulation results show that MBM can effectively reduce the matching set and dynamically provide the optimized performance with different quality of service (QoS) requirements.
- Book Chapter
1
- 10.1007/1-84628-231-4_12
- Jan 1, 2006
Faced with the imminent retirement of two senior employees who used to make decisions on bus allocations to customers manually, a bus rental company in Seoul, South Korea, asked us to develop a DMSS (decision-making support system) to help the young fresh graduate employee who will be taking over this job from them. Practice has shown that allocation and routing decisions made manually by human operators with long experience are usually nearly optimal, and it is very hard to beat those decisions using a computerized DMSS. Therefore the company asked us to design an i-DMSS (intelligent DMSS) that can help the new decision maker to reach decisions comparable in quality to those made by the retiring pair of senior decision makers. In this paper we discuss this decision problem, its context, the models we used to solve it, the algorithms we used in the i-DMSS to solve these models, and how this i-DMSS is used to make the decisions daily. The i-DMSS is based on bipartite matching and transportation algorithms and heuristics, and produces solutions 10–20% more economical than the manual decisions.