Fixed-length Vector Research Articles

Simple SummaryThe family of coronaviruses comprises a diverse set of strains and variants which cause diseases from the common cold to COVID-19. Moreover, they infect a wide array of hosts from bats, camels, birds, to humans. Studying coronaviruses through the lens of host specificity provides a unique perspective to understanding the evolution, diversity and dynamics of this family. In particular, this can reveal groups of different hosts infected by similar strains, giving clues on strains which were more likely to have evolved to jump from one host to another. In this work, we frame host specificity as a classification task, in designing a very compact numerical representation of the spike sequences of different coronaviruses. Based on this numerical representation, classification methods are able to detect the target host with high accuracy. Such an approach can used to efficiently scale to large volumes of sequences, in order to unveil trends in the host specificity of different coronavirus strains.The study of host specificity has important connections to the question about the origin of SARS-CoV-2 in humans which led to the COVID-19 pandemic—an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona)viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating, and preventing current and future pandemics. In coronaviruses, the surface (S) protein, or spike protein, is important in determining host specificity, since it is the point of contact between the virus and the host cell membrane. In this paper, we classify the hosts of over five thousand coronaviruses from their spike protein sequences, segregating them into clusters of distinct hosts among birds, bats, camels, swine, humans, and weasels, to name a few. We propose a feature embedding based on the well-known position weight matrix (PWM), which we call PWM2Vec, and we use it to generate feature vectors from the spike protein sequences of these coronaviruses. While our embedding is inspired by the success of PWMs in biological applications, such as determining protein function and identifying transcription factor binding sites, we are the first (to the best of our knowledge) to use PWMs from viral sequences to generate fixed-length feature vector representations, and use them in the context of host classification. The results on real world data show that when using PWM2Vec, machine learning classifiers are able to perform comparably to the baseline models in terms of predictive performance and runtime—in some cases, the performance is better. We also measure the importance of different amino acids using information gain to show the amino acids which are important for predicting the host of a given coronavirus. Finally, we perform some statistical analyses on these results to show that our embedding is more compact than the embeddings of the baseline models.

Read full abstract

Biometric technologies, especially face recognition, have become an essential part of identity management systems worldwide. In deployments of biometrics, secure storage of biometric information is necessary in order to protect the users’ privacy. In this context, biometric cryptosystems are designed to meet key requirements of biometric information protection enabling a privacy-preserving storage and comparison of biometric data, e.g. feature vectors extracted from facial images. Until now, biometric cryptosystems have hardly been applied to state-of-the-art biometric recognition systems utilizing deep convolutional neural networks.This work investigates the application of a well-known biometric cryptosystem, i.e. the improved fuzzy vault scheme, to facial feature vectors extracted through deep convolutional neural networks. To this end, a feature transformation method is introduced which maps fixed-length real-valued deep feature vectors to integer-valued feature sets. As part of said feature transformation, a detailed analysis of different feature quantisation and binarisation techniques is conducted. At key binding, obtained feature sets are locked in an unlinkable improved fuzzy vault. For key retrieval, the efficiency of different polynomial reconstruction techniques is investigated. The proposed feature transformation method and template protection scheme are agnostic of the biometric characteristic and, thus, can be applied to virtually any biometric features computed by a deep neural network. In experiments, an unlinkable improved deep face fuzzy vault-based template protection scheme is constructed employing features extracted with a state-of-the-art deep convolutional neural network trained with the additive angular margin loss (ArcFace). For the best configuration, a false non-match rate below 1% at a false match rate of 0.01%, is achieved in cross-database experiments on the FERET and FRGCv2 face databases. On average, a security level of up to approximately 28 bits is obtained. This work presents an effective face-based fuzzy vault scheme providing privacy protection of facial reference data as well as digital key derivation from face.

Read full abstract

Fixed-length Vector Research Articles

Related Topics

Articles published on Fixed-length Vector

A fisher score-based multi-instance learning method assisted by mixture of factor analysis

PredAoDP: Accurate identification of antioxidant proteins by fusing different descriptors based on evolutionary information with support vector machine

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

Hybrid CNN-LSTM Architecture for LiDAR Point Clouds Semantic Segmentation

SOIT: Segmenting Objects with Instance-Aware Transformers

Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors

Efficient analysis of COVID-19 clinical data using machine learning models.

Clustering a database of optically absorbing organic molecules via a hierarchical fingerprint scheme that categorizes similar functional molecular fragments.

A Tailings Dam Long-Term Deformation Prediction Method Based on Empirical Mode Decomposition and LSTM Model Combined with Attention Mechanism

Complex graph convolutional network for link prediction in knowledge graphs

A novel multi-innovation gradient support vector machine regression method

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences.

Compact feature hashing for machine learning based malware detection

Spherical harmonic shape descriptors of nodal force demands for quantifying spatial truss connection complexity

Interactive Visual Pattern Search on Graph Data via Graph Representation Learning.

Water as a Lévy Rotor.

Scale-invariant histogram of oriented gradients: novel approach for pedestrian detection in multiresolution image dataset

Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.

Deep face fuzzy vault: Implementation and performance

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fixed-length Vector Research Articles

Related Topics

Articles published on Fixed-length Vector

A fisher score-based multi-instance learning method assisted by mixture of factor analysis

PredAoDP: Accurate identification of antioxidant proteins by fusing different descriptors based on evolutionary information with support vector machine

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

Hybrid CNN-LSTM Architecture for LiDAR Point Clouds Semantic Segmentation

SOIT: Segmenting Objects with Instance-Aware Transformers

Estimating the degree of conflict in speech by employing Bag-of-Audio-Words and Fisher Vectors

Efficient analysis of COVID-19 clinical data using machine learning models.

Clustering a database of optically absorbing organic molecules via a hierarchical fingerprint scheme that categorizes similar functional molecular fragments.

A Tailings Dam Long-Term Deformation Prediction Method Based on Empirical Mode Decomposition and LSTM Model Combined with Attention Mechanism

Complex graph convolutional network for link prediction in knowledge graphs

A novel multi-innovation gradient support vector machine regression method

PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences.

Compact feature hashing for machine learning based malware detection

Spherical harmonic shape descriptors of nodal force demands for quantifying spatial truss connection complexity

Interactive Visual Pattern Search on Graph Data via Graph Representation Learning.

Water as a Lévy Rotor.

Scale-invariant histogram of oriented gradients: novel approach for pedestrian detection in multiresolution image dataset

Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition.

Deep face fuzzy vault: Implementation and performance