Articles published on Euclidean embedding
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
73 Search results
Sort by Recency
- Research Article
- 10.1049/ise2/6755170
- Jan 1, 2025
- IET Information Security
- Wen-Bin Hsieh
Biometric authentication is adopted in many access control scenarios in recent years. It is very convenient and secure since it compares the user’s own biometrics with those stored in the database to confirm their identification. Since then, with the vigorous development of machine learning, the performance and accuracy of biometric authentication have been greatly improved. Face recognition technology combined with convolutional neural network (CNN) is extremely efficient and has become the mainstream of access control systems (ACSs). However, identity information and access logs stored in traditional databases can be tampered by malicious insiders. Therefore, we propose a face recognition ACS that is resistant to data forgery. In this paper, a deep convolutional network is utilized to learn Euclidean embedding (based on FaceNet) of each image and achieve face recognition and verification. Quorum, which is built on the Ethereum blockchain, is used to store facial feature vectors and login information. Smart contracts are made to automatically put data into blocks on the chain. One is used to store feature vectors, and the other to record the arrival and departure times of employees. By combining these cutting‐edge technologies, an intelligent and immutable ACS that can withstand distributed denial‐of‐service (DDoS) and other internal and external attacks is created. Finally, an experiment is conducted to assess the effectiveness of the proposed system to demonstrate its practicality.
- Research Article
6
- 10.1093/bioadv/vbaf060
- Dec 26, 2024
- Bioinformatics Advances
- Navid Naderializadeh + 1 more
MotivationProtein language models (PLMs) have emerged as powerful approaches for mapping protein sequences into embeddings suitable for various applications. As protein representation schemes, PLMs generate per-token (i.e. per-residue) representations, resulting in variable-sized outputs based on protein length. This variability poses a challenge for protein-level prediction tasks that require uniform-sized embeddings for consistent analysis across different proteins. Previous work has typically used average pooling to summarize token-level PLM outputs, but it is unclear whether this method effectively prioritizes the relevant information across token-level representations.ResultsWe introduce a novel method utilizing optimal transport to convert variable-length PLM outputs into fixed-length representations. We conceptualize per-token PLM outputs as samples from a probabilistic distribution and employ sliced-Wasserstein distances to map these samples against a reference set, creating a Euclidean embedding in the output space. The resulting embedding is agnostic to the length of the input and represents the entire protein. We demonstrate the superiority of our method over average pooling for several downstream prediction tasks, particularly with constrained PLM sizes, enabling smaller-scale PLMs to match or exceed the performance of average-pooled larger-scale PLMs. Our aggregation scheme is especially effective for longer protein sequences by capturing essential information that might be lost through average pooling.Availability and implementationOur implementation code can be found at https://github.com/navid-naderi/PLM_SWE.
- Research Article
- 10.1162/tacl_a_00712
- Nov 4, 2024
- Transactions of the Association for Computational Linguistics
- Yuyin Lu + 6 more
Abstract Topic taxonomy discovery aims at uncovering topics of different abstraction levels and constructing hierarchical relations between them. Unfortunately, most prior work can hardly model semantic scopes of words and topics by holding the Euclidean embedding space assumption. What’s worse, they infer asymmetric hierarchical relations by symmetric distances between topic embeddings. As a result, existing methods suffer from problems of low-quality topics at high abstraction levels and inaccurate hierarchical relations. To alleviate these problems, this paper develops a Box embedding-based Topic Model (BoxTM) that maps words and topics into the box embedding space, where the asymmetric metric is defined to properly infer hierarchical relations among topics. Additionally, our BoxTM explicitly infers upper-level topics based on correlation between specific topics through recursive clustering on topic boxes. Finally, extensive experiments validate high-quality of the topic taxonomy learned by BoxTM.
- Research Article
- 10.1371/journal.pone.0302425
- May 10, 2024
- PLOS ONE
- Nomenjanahary Alexia Raharinirina + 4 more
The joint analysis of two datasets [Formula: see text] and [Formula: see text] that describe the same phenomena (e.g. the cellular state), but measure disjoint sets of variables (e.g. mRNA vs. protein levels) is currently challenging. Traditional methods typically analyze single interaction patterns such as variance or covariance. However, problem-tailored external knowledge may contain multiple different information about the interaction between the measured variables. We introduce MIASA, a holistic framework for the joint analysis of multiple different variables. It consists of assembling multiple different information such as similarity vs. association, expressed in terms of interaction-scores or distances, for subsequent clustering/classification. In addition, our framework includes a novel qualitative Euclidean embedding method (qEE-Transition) which enables using Euclidean-distance/vector-based clustering/classification methods on datasets that have a non-Euclidean-based interaction structure. As an alternative to conventional optimization-based multidimensional scaling methods which are prone to uncertainties, our qEE-Transition generates a new vector representation for each element of the dataset union [Formula: see text] in a common Euclidean space while strictly preserving the original ordering of the assembled interaction-distances. To demonstrate our work, we applied the framework to three types of simulated datasets: samples from families of distributions, samples from correlated random variables, and time-courses of statistical moments for three different types of stochastic two-gene interaction models. We then compared different clustering methods with vs. without the qEE-Transition. For all examples, we found that the qEE-Transition followed by Ward clustering had superior performance compared to non-agglomerative clustering methods but had a varied performance against ultrametric-based agglomerative methods. We also tested the qEE-Transition followed by supervised and unsupervised machine learning methods and found promising results, however, more work is needed for optimal parametrization of these methods. As a future perspective, our framework points to the importance of more developments and validation of distance-distribution models aiming to capture multiple-complex interactions between different variables.
- Research Article
- 10.1007/s11042-024-18885-7
- Apr 2, 2024
- Multimedia Tools and Applications
- V Ramanjaneyulu Yannam + 3 more
Euclidean embedding with preference relation for recommender systems
- Research Article
- 10.1109/access.2024.3434612
- Jan 1, 2024
- IEEE Access
- Sultan Alshamrani
Large language models (LLMs) have revolutionized natural language processing (NLP), enabling machines to process, understand and generate human-like text with high accuracy. However, the current practices in training and evaluating these models often overlook the relationship between the embeddings of training and testing samples, leading to potential overfitting and limited generalization capabilities. This paper introduces a new approach to enhancing the performance, reliability, and generalization of LLMs by curating training and testing samples based on the Euclidean distances between their embeddings. The central hypothesis is that training models on samples with high Euclidean distances between training and testing embeddings, coupled with evaluations spanning diverse distances, will improve the models' robustness and adaptability to inputs diverging from the training data distribution. The comprehensive evaluation across multiple datasets and architectures shows that models trained on samples with high Euclidean distances from the testing samples generally exhibit superior generalization and robustness compared to those trained on low-distance samples. The proposed evaluation methodology, assessing performance across a range of distances, provides a more reliable measure of a model's true adaptability. This study provides insights into the relationship between training data diversity and model reliability, paving the way for more robust and generalizable LLMs.
- Research Article
1
- 10.1371/journal.pcbi.1011748
- Dec 27, 2023
- PLOS Computational Biology
- Tristan Baumann + 1 more
The structure of the internal representation of surrounding space, the so-called cognitive map, has long been debated. A Euclidean metric map is the most straight-forward hypothesis, but human navigation has been shown to systematically deviate from the Euclidean ground truth. Vector navigation based on non-metric models can better explain the observed behavior, but also discards useful geometric properties such as fast shortcut estimation and cue integration. Here, we propose another alternative, a Euclidean metric map that is systematically distorted to account for the observed behavior. The map is found by embedding the non-metric model, a labeled graph, into 2D Euclidean coordinates. We compared these two models using data from a human behavioral study where participants had to learn and navigate a non-Euclidean maze (i.e., with wormholes) and perform direct shortcuts between different locations. Even though the Euclidean embedding cannot correctly represent the non-Euclidean environment, both models predicted the data equally well. We argue that the embedding naturally arises from integrating the local position information into a metric framework, which makes the model more powerful and robust than the non-metric alternative. It may therefore be a better model for the human cognitive map.
- Research Article
1
- 10.3390/e25121611
- Nov 30, 2023
- Entropy
- Wei Wu + 1 more
Dynamic network representation learning has recently attracted increasing attention because real-world networks evolve over time, that is nodes and edges join or leave the networks over time. Different from static networks, the representation learning of dynamic networks should not only consider how to capture the structural information of network snapshots, but also consider how to capture the temporal dynamic information of network structure evolution from the network snapshot sequence. From the existing work on dynamic network representation, there are two main problems: (1) A significant number of methods target dynamic networks, which only allow nodes to increase over time, not decrease, which reduces the applicability of such methods to real-world networks. (2) At present, most network-embedding methods, especially dynamic network representation learning approaches, use Euclidean embedding space. However, the network itself is geometrically non-Euclidean, which leads to geometric inconsistencies between the embedded space and the underlying space of the network, which can affect the performance of the model. In order to solve the above two problems, we propose a geometry-based dynamic network learning framework, namely DyLFG. Our proposed framework targets dynamic networks, which allow nodes and edges to join or exit the network over time. In order to extract the structural information of network snapshots, we designed a new hyperbolic geometry processing layer, which is different from the previous literature. In order to deal with the temporal dynamics of the network snapshot sequence, we propose a gated recurrent unit (GRU) module based on Ricci curvature, that is the RGRU. In the proposed framework, we used a temporal attention layer and the RGRU to evolve the neural network weight matrix to capture temporal dynamics in the network snapshot sequence. The experimental results showed that our model outperformed the baseline approaches on the baseline datasets.
- Research Article
9
- 10.1038/s42005-023-01143-x
- Feb 2, 2023
- Communications Physics
- Bianka Kovács + 1 more
The arrangement of network nodes in hyperbolic spaces has become a widely studied problem, motivated by numerous results suggesting the existence of hidden metric spaces behind the structure of complex networks. Although several methods have already been developed for the hyperbolic embedding of undirected networks, approaches able to deal with directed networks are still in their infancy. Here, we present a framework based on the dimension reduction of proximity matrices reflecting the network topology, coupled with a general conversion method transforming Euclidean node coordinates into hyperbolic ones even for directed networks. While proposing a measure of proximity based on the shortest path length, we also incorporate an earlier Euclidean embedding method in our pipeline, demonstrating the widespread applicability of our Euclidean-hyperbolic conversion. Besides, we introduce a dimension reduction technique that maps the nodes directly into the hyperbolic space of any number of dimensions with the aim of reproducing a distance matrix measured on the given (un)directed network. According to various commonly used quality scores, our methods are capable of producing high-quality embeddings for several real networks.
- Research Article
- 10.1109/tetci.2022.3182752
- Feb 1, 2023
- IEEE Transactions on Emerging Topics in Computational Intelligence
- Adarsh Prasad Behera + 3 more
This work proposes a novel curvature regularization method to regularize the individual sectional curvatures in Similarity-Based Ricci Flow Embedding (SBRFE) to reduce the non-Euclidean artefacts and compute the Euclidean embedding of similarity-based data sets. In pattern recognition, pairwise similarity or dissimilarity data have been used as alternative to the more conventional feature vector representation. While similarity representations are rarely Euclidean, most methods involving statistical analysis or learning of such data require them to be Euclidean. In SBRFE, each similarity is considered an individual entity, and the respective sectional curvature is calculated and updated separately. This compromises the smoothness of the manifold and causes numerical instability. To overcome this problem, we used regularized Newton’s method (RNM) to regularize the sectional curvatures of each patch obtained from the initial curvature computation. It ensures numerical stability in the embedding and smoothens the manifold. It also preserves both the local and global geometry of the original data sets. Results show that proposed curvature regularized similarity-based Ricci Flow Embedding (CRRFE) is able to estimate the Euclidean embedding of similarity data sets with much lower computation cost and time complexity than the existing regularization method. Comparison results show that our proposed methodology outperforms other existing embedding methods in most data sets with a lower classification error rate.
- Research Article
4
- 10.1093/bioadv/vbad066
- Jan 5, 2023
- Bioinformatics Advances
- Yang Yue + 5 more
To predict drug targets, graph-based machine-learning methods have been widely used to capture the relationships between drug, target and disease entities in drug-disease-target (DDT) networks. However, many methods cannot explicitly consider disease types at inference time and so will predict the same target for a given drug under any disease condition. Meanwhile, DDT networks are usually organized hierarchically carrying interactive relationships between involved entities, but these methods, especially those based on Euclidean embedding cannot fully utilize such topological information, which might lead to sub-optimal results. We hypothesized that, by importing hyperbolic embedding specifically for modeling hierarchical DDT networks, graph-based algorithms could better capture relationships between aforementioned entities, which ultimately improves target prediction performance. We formulated the target prediction problem as a knowledge graph completion task explicitly considering disease types. We proposed FLONE, a hyperbolic embedding-based method based on capturing hierarchical topological information in DDT networks. The experimental results on two DDT networks showed that by introducing hyperbolic space, FLONE generates more accurate target predictions than its Euclidean counterparts, which supports our hypothesis. We also devised hyperbolic encoders to fuse external domain knowledge, to make FLONE enable handling samples corresponding to previously unseen drugs and targets for more practical scenarios. Source code and dataset information are at: https://github.com/arantir123/DDT_triple_prediction. Supplementary data are available at Bioinformatics Advances online.
- Research Article
2
- 10.1016/j.dam.2022.10.014
- Jan 1, 2023
- Discrete Applied Mathematics
- Sebastian M Cioabă + 3 more
The least Euclidean distortion constant of a distance-regular graph
- Research Article
- 10.1145/3584367.3584369
- Dec 1, 2022
- ACM SIGEVOlution
- Krzysztof Michalak
The low dimensional Euclidean embedding method (LDEE) allows visualizing combinatorial search spaces by mapping to the Euclidean space R k (with k = 2 or 3 in practice). The mapping of a combinatorial search space Ω to R k is obtained by first running the t-SNE (t-distributed stochastic neighbor embedding) algorithm with an appropriate probability distribution used for the space Ω (for example the Mallows distribution for permutation spaces). Subsequently, the vacuum embedding algorithm, proposed in this article, is used to ensure good visual separation of solutions in R k . The LDEE method maps solutions to a regular grid in R k , which can be used for plotting various kinds of information. Apart from solution evaluations or comparisons of multiple objectives, the proposed method can be used for analyzing the behavior of the population in population-based metaheuristics, the working of genetic operators, etc. This newsletter contribution summarizes a recent research article [1].
- Research Article
1
- 10.1109/tvcg.2021.3109975
- Dec 1, 2022
- IEEE Transactions on Visualization and Computer Graphics
- Qianwei Xia + 6 more
In this article, we develop a novel method for fast geodesic distance queries. The key idea is to embed the mesh into a high-dimensional space, such that the euclidean distance in the high-dimensional space can induce the geodesic distance in the original manifold surface. However, directly solving the high-dimensional embedding problem is not feasible due to the large number of variables and the fact that the embedding problem is highly nonlinear. We overcome the challenges with two novel ideas. First, instead of taking all vertices as variables, we embed only the saddle vertices, which greatly reduces the problem complexity. We then compute a local embedding for each non-saddle vertex. Second, to reduce the large approximation error resulting from the purely euclidean embedding, we propose a cascaded optimization approach that repeatedly introduces additional embedding coordinates with a non-euclidean function to reduce the approximation residual. Using the precomputation data, our approach can determine the geodesic distance between any two vertices in near-constant time. Computational testing results show that our method is more desirable than previous geodesic distance queries methods.
- Research Article
19
- 10.1103/physreve.104.044315
- Oct 22, 2021
- Physical Review E
- Yi-Jiao Zhang + 2 more
Network embedding techniques aim to represent structural properties of graphs in geometric space. Those representations are considered useful in downstream tasks such as link prediction and clustering. However, the number of graph embedding methods available on the market is large, and practitioners face the nontrivial choice of selecting the proper approach for a given application. The present work attempts to close this gap of knowledge through a systematic comparison of 11 different methods for graph embedding. We consider methods for embedding networks in the hyperbolic and Euclidean metric spaces, as well as nonmetric community-based embedding methods. We apply these methods to embed more than 100 real-world and synthetic networks. Three common downstream tasks - mapping accuracy, greedy routing, and link prediction - are considered to evaluate the quality of the various embedding methods. Our results show that some Euclidean embedding methods excel in greedy routing. As for link prediction, community-based and hyperbolic embedding methods yield an overall performance that is superior to that of Euclidean-space-based approaches. We compare the running time for different methods and further analyze the impact of different network characteristics such as degree distribution, modularity, and clustering coefficients on the quality of the embedding results. We release our evaluation framework to provide a standardized benchmark for arbitrary embedding methods.
- Research Article
26
- 10.1016/j.knosys.2021.107369
- Aug 5, 2021
- Knowledge-Based Systems
- Adnan Zeb + 4 more
Learning hyperbolic attention-based embeddings for link prediction in knowledge graphs
- Research Article
2
- 10.1007/s10589-021-00279-2
- Apr 28, 2021
- Computational Optimization and Applications
- Qian Zhang + 2 more
Euclidean embedding from noisy observations containing outlier errors is an important and challenging problem in statistics and machine learning. Many existing methods would struggle with outliers due to a lack of detection ability. In this paper, we propose a matrix optimization based embedding model that can produce reliable embeddings and identify the outliers jointly. We show that the estimators obtained by the proposed method satisfy a non-asymptotic risk bound, implying that the model provides a high accuracy estimator with high probability when the order of the sample size is roughly the degree of freedom up to a logarithmic factor. Moreover, we show that under some mild conditions, the proposed model also can identify the outliers without any prior information with high probability. Finally, numerical experiments demonstrate that the matrix optimization-based model can produce configurations of high quality and successfully identify outliers even for large networks.
- Research Article
1
- 10.1007/s00022-021-00579-2
- Apr 15, 2021
- Journal of Geometry
- Mikhail G Katz + 1 more
We prove an inequality of Bonnesen type for the real projective plane, generalizing Pu’s systolic inequality for positively-curved metrics. The remainder term in the inequality, analogous to that in Bonnesen’s inequality, is a function of $$R-r$$ (suitably normalized), where R and r are respectively the circumradius and the inradius of the Weyl–Lewy Euclidean embedding of the orientable double cover. We exploit John ellipsoids of a convex body and Pogorelov’s ridigity theorem.
- Research Article
9
- 10.1109/tnsm.2021.3051736
- Jan 14, 2021
- IEEE Transactions on Network and Service Management
- Ruchi Tripathi + 1 more
Recent decades have observed an exponential growth in network traffic, thanks to the increased popularity of real-time applications, such as live video chat and gaming. The resulting growth in the network infrastructure has made it difficult for the service providers to abide by the service level agreements, especially with regards to the quality-of-service guarantees. Predicting network latencies from noisy and missing measurements has therefore emerged as an important problem, and a plethora of solutions have been proposed for the same. Existing network latency predictions rely either on Euclidean embedding or matrix completion methods. This work considers the estimation and prediction of network latencies from a sequence of noisy and incomplete latency matrices collected over time. An adaptive matrix completion algorithm is proposed that can handle streaming data at low computational complexity. The performance of the proposed algorithm is characterized both in theory and using a real dataset, demonstrating its viability as a network monitoring tool.
- Research Article
30
- 10.1103/physrevd.102.116019
- Dec 29, 2020
- Physical Review D
- Tianji Cai + 3 more
We introduce an efficient framework for computing the distance between collider events using the tools of Linearized Optimal Transport (LOT). This preserves many of the advantages of the recently-introduced Energy Mover's Distance, which quantifies the "work" required to rearrange one event into another, while significantly reducing the computational cost. It also furnishes a Euclidean embedding amenable to simple machine learning algorithms and visualization techniques, which we demonstrate in a variety of jet tagging examples. The LOT approximation lowers the threshold for diverse applications of the theory of optimal transport to collider physics.