Fixed-length Vector Research Articles

Object retrieval systems measure the degree of similarity of the shape of 3D models. They search for the elements of the 3D model databases that resemble the query model. In structural bioinformatics, the query model is a protein tertiary/quaternary structure and the objective is to find similarly shaped molecules in the Protein Data Bank. With the ever-growing size of the PDB, a direct atomic coordinate comparison with all its members is impractical. To overcome this problem, the shape of the molecules can be encoded by fixed-length feature vectors. The distance of a protein to the entire PDB can be measured in this low-dimensional domain in linear time. The state-of-the-art approaches utilize Zernike-Canterakis moments for the shape encoding and supply the retrieval process with geometric data of the input structures. The BioZernike descriptors are a standard utility of the PDB since 2020. However, when trying to calculate the ZC moments locally, the issue of the deficiency of libraries readily available for use in custom programs (i.e., without relying on external binaries) is encountered, in particular programs written in Python. Here, a fast and well-documented Python implementation of the Pozo-Koehl algorithm is presented. In contrast to the more popular algorithm by Novotni and Klein, which is based on the voxelized volume, the PK algorithm produces ZC moments directly from the triangular surface meshes of 3D models. In particular, it can accept the molecular surfaces of proteins as its input. In the presented PK-Zernike library, owing to Numba's just-in-time compilation, a mesh with 50,000 facets is processed by a single thread in a second at the moment order 20. Since this is the first time the PK algorithm is used in structural bioinformatics, it is employed in a novel, simple, but efficient protein structure retrieval pipeline. The elimination of the outlying chain fragments via a fast PCA-based subroutine improves the discrimination ability, allowing for this pipeline to achieve an 0.961 area under the ROC curve in the BioZernike validation suite (0.997 for the assemblies). The correlation between the results of the proposed approach and of the 3D Surfer program attains values up to 0.99.

Read full abstract

AbstractOBJECTIVESApplication of deep learning approaches to marker trajectories and ground reaction forces (mocap data), is often hampered by small datasets. Enlarging dataset size is possible using some simple numerical approaches, although these may not be suited to preserving the physiological relevance of mocap data. We propose augmenting mocap data using a deep learning architecture called “generative adversarial networks” (GANs). We demonstrate appropriate use of GANs can capture variations of walking patterns due to subject- and task-specific conditions (mass, leg length, age, gender and walking speed), which significantly affect walking kinematics and kinetics, resulting in augmented datasets amenable to deep learning analysis approaches.METHODSA publicly available (https://www.nature.com/articles/s41597-019-0124-4) gait dataset (733 trials, 21 women and 25 men, 37.2 ± 13.0 years, 1.74 ± 0.09 m, 72.0 ± 11.4 kg, walking speeds ranging from 0.18 m/s to 2.04 m/s) was used as the experimental dataset. The GAN comprised three neural networks: an encoder, a decoder, and a discriminator. The encoder compressed experimental data into a fixed-length vector, while the decoder transformed the encoder's output vector and a condition vector (containing information about the subject and trial) into mocap data. The discriminator distinguished between the encoded experimental data from randomly sampled vectors of the same size. By training these networks jointly using the experimental dataset, the generator (decoder) could generate synthetic data respecting specified conditions from randomly sampled vectors. Synthetic mocap data and lower limb joint angles were generated and compared to the experimental data, by identifying the statistically significant differences across the gait cycle for a randomly selected subset of the experimental data from 5 female subjects (73 trials, aged 26–40, weighing 57–74 kg, with leg lengths between 868–931 mm, and walking speeds ranging from 0.81–1.68 m/s). By conducting these comparisons for this subset, we aimed to assess the synthetic data generated using multiple conditions.RESULTSWe visually inspected the synthetic trials to ensure that they appeared realistic. The statistical comparison revealed that, on average, only 2.5% of the gait cycle showed significantly differences in the joint angles of the two data groups. Additionally, the synthetic ground reaction forces deviated from the experimental data distribution for an average of 2.9% of the gait cycle.CONCLUSIONSWe introduced a novel approach for generating synthetic mocap data of human walking based on the conditions that influence walking patterns. The synthetic data closely followed the trends observed in the experimental data, also in the literature, suggesting that our approach can augment mocap datasets considering multiple conditions, an approach unfeasible in previous work. Creation of large, augmented datasets allows the application of other deep learning approaches, with the potential to generate realistic mocap data from limited and non-lab-based data. Our method could also enhance data sharing since synthetic data does not raise ethical concerns. You can generate and download virtual gait data using our GAN approach from https://thisgaitdoesnotexist.streamlit.app/.Declaration of Interest(b) declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported:I declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research project.

Read full abstract

Fixed-length Vector Research Articles

Related Topics

Articles published on Fixed-length Vector

AABBA Graph Kernel: Atom-Atom, Bond-Bond, and Bond-Atom Autocorrelations for Machine Learning.

Particle Swarm Optimization for Efficiently Evolving Deep Convolutional Neural Networks Using an Autoencoder-Based Encoding Strategy

PLM-T3SE: Accurate Prediction of Type III Secretion Effectors Using Protein Language Model Embeddings.

Anomaly detection based on system text logs of virtual network functions

HashGAT-VCA: A vector cellular automata model with hash function and graph attention network for urban land-use change simulation

Learning semi-supervised enrichment of longitudinal imaging-genetic data for improved prediction of cognitive decline

Passage-aware Search Result Diversification

Encrypted Network Traffic Classification Using Deep and Parallel Network-In-Network Models

Transitivity-Preserving Graph Representation Learning for Bridging Local Connectivity and Role-Based Similarity

On the optimality of quantum circuit initial mapping using reinforcement learning

A Deep Long-Term Joint Temporal-Spectral Network for Spectrum Prediction.

Graph Representation Learning-Based Fixed-Length Clinical Feature Vector Generation from Heterogeneous Medical Records.

Emotion generation method in online physical education teaching based on data mining of teacher-student interactions.

PseAAC2Vec protein encoding for TCR protein sequence classification

Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python.

DEEP LEARNING FOR ENLARGING HUMAN MOTION CAPTURE (MOCAP) DATASETS

Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation

Self-Organizing Neural Scheduler for the Flexible Job Shop Problem With Periodic Maintenance and Mandatory Outsourcing Constraints.

Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems

Self-supervised contrastive representation learning for large-scale trajectories

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fixed-length Vector Research Articles

Related Topics

Articles published on Fixed-length Vector

AABBA Graph Kernel: Atom-Atom, Bond-Bond, and Bond-Atom Autocorrelations for Machine Learning.

Particle Swarm Optimization for Efficiently Evolving Deep Convolutional Neural Networks Using an Autoencoder-Based Encoding Strategy

PLM-T3SE: Accurate Prediction of Type III Secretion Effectors Using Protein Language Model Embeddings.

Anomaly detection based on system text logs of virtual network functions

HashGAT-VCA: A vector cellular automata model with hash function and graph attention network for urban land-use change simulation

Learning semi-supervised enrichment of longitudinal imaging-genetic data for improved prediction of cognitive decline

Passage-aware Search Result Diversification

Encrypted Network Traffic Classification Using Deep and Parallel Network-In-Network Models

Transitivity-Preserving Graph Representation Learning for Bridging Local Connectivity and Role-Based Similarity

On the optimality of quantum circuit initial mapping using reinforcement learning

A Deep Long-Term Joint Temporal-Spectral Network for Spectrum Prediction.

Graph Representation Learning-Based Fixed-Length Clinical Feature Vector Generation from Heterogeneous Medical Records.

Emotion generation method in online physical education teaching based on data mining of teacher-student interactions.

PseAAC2Vec protein encoding for TCR protein sequence classification

Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python.

DEEP LEARNING FOR ENLARGING HUMAN MOTION CAPTURE (MOCAP) DATASETS

Self-Organizing Memory Based on Adaptive Resonance Theory for Vision and Language Navigation

Self-Organizing Neural Scheduler for the Flexible Job Shop Problem With Periodic Maintenance and Mandatory Outsourcing Constraints.

Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems

Self-supervised contrastive representation learning for large-scale trajectories