Pairwise Distance Matrix Research Articles

Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands.Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with R0 ≥ 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification.

Read full abstract

We introduce the Voronoi fundamental zone (VFZ) framework which is useful for grain boundary (GB) structure–property models and gaining insights about the nature of a five degree-of-freedom (5DOF) fundamental zone (FZ) for both cubic and non-cubic symmetries and potentially alloys. We cover the methods associated with the VFZ framework. The VFZ framework offers an advantage over other 5DOF based property interpolation methods because directly computed Euclidean distances approximate the original grain boundary octonion (GBO) distance with significantly reduced runtime (~7 CPU min vs. 153 CPU days for a 50,000×50,000 pairwise-distance matrix). We perform boundary energy (GBE) interpolation for a non-smooth validation function on sets of up to 50,000 GBs using four interpolation methods: barycentric interpolation, Gaussian process regression (GPR) or Kriging, inverse-distance weighting (IDW), and nearest neighbor (NN) interpolation. The best performance was achieved with GPR which results in a reduction of the root mean square error (RMSE) by 83% relative to RMSE of a constant, average model. This error is comparable to the minimum expected uncertainty associated with reconstruction of noise-free experimental polycrystalline data. We then use GPR to interpolate simulated bi-crystal datasets for Fe and Ni and demonstrate better than (34.4% vs. 21.2% improvement) and similar (57.6% vs. 56.4% improvement) performance to prior work, respectively. The noise and non-uniform sampling in the Fe dataset make it difficult to resolve low GBE (i.e. cusps) and to validate the model. We resolve the Ni dataset cusps with high accuracy, but uncertainty may be high in regions far from the input data. The trade-offs between noise, dataset size, sampling scheme, and repeat measurements must be carefully managed. We provide a vectorized, parallelized, MATLAB interpolation function (interp5DOF.m) and related routines ( github.com/sgbaird-5dof/interp) which can be applied to future datasets for a variety of GB properties and to better understand 5DOF FZs. For example, we estimate the maximum dimension of an Oh VFZ to be ~66.5° in the GBO sense.

Read full abstract

Pairwise Distance Matrix Research Articles

Related Topics

Articles published on Pairwise Distance Matrix

Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach.

Generalized black hole clustering algorithm

ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment

Molecular analysis of NS1 gene of Indian protoparvoviruses

Molecular analysis of NS1 gene of Indian protoparvoviruses

Uncovering the Morphological and Genetical Heterogeneity of Pyricularia oryzae (cooke) sacc. in Southwestern Region of Bangladesh

Ksak: A high-throughput tool for alignment-free phylogenetics.

Integrative Approaches Establish Colour Polymorphism in the Bamboo-Feeding Leafhopper Mukaria splendida Distant (Hemiptera: Cicadellidae) from India

Insights into the genetic landscape and presence of Cochliomyia hominivorax in the Caribbean.

A deep learning approach to real-time HIV outbreak detection using genetic data.

MicrobiomeGWAS: A Tool for Identifying Host Genetic Variants Associated with Microbiome Composition

Algorithm of ant colony optimization (ACO) for 3D variation traveling salesman problem

PhyloM: A Computer Program for Phylogenetic Inference from Measurement or Binary Data, with Bootstrapping.

Quantifying the Alignment of Graph and Features in Deep Learning.

Understanding population structure in an evolutionary context: population-specific FST and pairwise FST.

Five degree-of-freedom property interpolation of arbitrary grain boundaries via Voronoi fundamental zone framework

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.

Chatbot to improve learning punctuation in Spanish and to enhance open and flexible learning environments

Hierarchical Meta-Storms enables comprehensive and rapid comparison of microbiome functional profiles on a large scale using hierarchical dissimilarity metrics and parallel computing.

A hybrid quantum regression model for the prediction of molecular atomization energies

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Pairwise Distance Matrix Research Articles

Related Topics

Articles published on Pairwise Distance Matrix

Scaling DEPP phylogenetic placement to ultra-large reference trees: a tree-aware ensemble approach.

Generalized black hole clustering algorithm

ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency Assessment

Molecular analysis of NS1 gene of Indian protoparvoviruses

Molecular analysis of NS1 gene of Indian protoparvoviruses

Uncovering the Morphological and Genetical Heterogeneity of Pyricularia oryzae (cooke) sacc. in Southwestern Region of Bangladesh

Ksak: A high-throughput tool for alignment-free phylogenetics.

Integrative Approaches Establish Colour Polymorphism in the Bamboo-Feeding Leafhopper Mukaria splendida Distant (Hemiptera: Cicadellidae) from India

Insights into the genetic landscape and presence of Cochliomyia hominivorax in the Caribbean.

A deep learning approach to real-time HIV outbreak detection using genetic data.

MicrobiomeGWAS: A Tool for Identifying Host Genetic Variants Associated with Microbiome Composition

Algorithm of ant colony optimization (ACO) for 3D variation traveling salesman problem

PhyloM: A Computer Program for Phylogenetic Inference from Measurement or Binary Data, with Bootstrapping.

Quantifying the Alignment of Graph and Features in Deep Learning.

Understanding population structure in an evolutionary context: population-specific FST and pairwise FST.

Five degree-of-freedom property interpolation of arbitrary grain boundaries via Voronoi fundamental zone framework

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.

Chatbot to improve learning punctuation in Spanish and to enhance open and flexible learning environments

Hierarchical Meta-Storms enables comprehensive and rapid comparison of microbiome functional profiles on a large scale using hierarchical dissimilarity metrics and parallel computing.

A hybrid quantum regression model for the prediction of molecular atomization energies