High-dimensional Vector Space Research Articles

Code search aims to retrieve code snippets from a large-scale codebase, where the semantics of the searched code match developers’ query intent. Code is a low-level implementation of programming intents, but query is always expressed as clear and high-level semantics, which makes it difficult for DL-based approaches to learn the semantic relationship between them. Through a large-scale empirical analysis on more than 2.2 million pairs of Java code and description, we found that the semantics of code and query can be aligned by enriching code with the descriptions of other code in terms of similar implementation. Based on the finding, we propose a code semantic enrichment approach for deep code search, named SemEnr. Specifically, we first enrich semantics for all code snippets in the training and testing data. We estimated the syntactic similarity of each code snippet from the training data and retrieved the most similar one for each. Thereafter, the semantics of one code snippet is represented by its code tokens and the description of the retrieved most similar code. During the model training, we used the attention mechanism to embed pairs of enriched code and query into the shared high-dimensional vector space. To enhance the quality of our learned representations, we integrated a multi-perspective co-attention mechanism, employing Convolutional Neural Networks (CNNs) to capture local correlations between code and query. Finally, we evaluated the effectiveness of our approach by performing experiments on two extensively used Java datasets. Our experimental results reveal that SemEnr achieves an MRR of 0.698 and 0.631, outperforming the best baseline CAT (a state-of-the-art DL-based model) by 19.93% and 18.83%, respectively. In addition, we conducted a user study involving 50 real-world queries to assess SemEnr’s performance, and the findings suggest that SemEnr outperformed baseline models by returning more relevant code snippets.

Read full abstract

Shortly after the worst of the COVID-19 pandemic, an outbreak of mpox introduced another critical public health emergency. Like the COVID-19 pandemic, the mpox outbreak was characterized by a rising prevalence of public health misinformation on social media, through which many US adults receive and engage with news. Digital misinformation continues to challenge the efforts of public health officials in providing accurate and timely information to the public. We examine the evolving topic distributions of social media narratives during the mpox outbreak to map the tension between rapidly diffusing misinformation and public health communication. This study aims to observe topical themes occurring in a large-scale collection of tweets about mpox using deep learning. We leveraged a data set comprised of all mpox-related tweets that were posted between May 7, 2022, and July 23, 2022. We then applied Sentence Bidirectional Encoder Representations From Transformers (S-BERT) to the content of each tweet to generate a representation of its content in high-dimensional vector space, where semantically similar tweets will be located closely together. We projected the set of tweet embeddings to a 2D map by applying principal component analysis and Uniform Manifold Approximation Projection (UMAP). Finally, we group these data points into 7 topical clusters using k-means clustering and analyze each cluster to determine its dominant topics. We analyze the prevalence of each cluster over time to evaluate longitudinal thematic changes. Our deep-learning pipeline revealed 7 distinct clusters of content: (1) cynicism, (2) exasperation, (3) COVID-19, (4) men who have sex with men, (5) case reports, (6) vaccination, and (7) World Health Organization (WHO). Clusters that largely communicated erroneous or irrelevant information began earlier and grew faster, reaching a wider audience than later communications by official instances and health officials. Within a few weeks of the first reported mpox cases, an avalanche of mostly false, misleading, irrelevant, or damaging information started to circulate on social media. Official institutions, including the WHO, acted promptly, providing case reports and accurate information within weeks, but were overshadowed by rapidly spreading social media chatter. Our results point to the need for real-time monitoring of social media content to optimize responses to public health emergencies.

Read full abstract

High-dimensional Vector Space Research Articles

Related Topics

Articles published on High-dimensional Vector Space

Improving information theory of context-aware phrase embeddings in HR domain

Hyperdimensional computing: a framework for stochastic computation and symbolic AI

The classification of Boolean degree 1 functions in high-dimensional finite vector spaces

A Lightweight Model Enhancing Facial Expression Recognition with Spatial Bias and Cosine-Harmony Loss

Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity

Random walk theory and application

Emotion detection from handwriting and drawing samples using an attention-based transformer model.

Exploiting Data Geometry in Machine Learning

Sentiment Analysis of Short Texts Using SVMs and VSMs-Based Multiclass Semantic Classification

SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search

PREDICTING 10 YEAR MORTALITY WITH MACHINE LEARNING: EVIDENCE FROM NATIONAL SOCIAL LIFE, HEALTH, AND AGING PROJECT

Identifying the Causes of Ship Collisions Accident Using Text Mining and Bayesian Networks

A novel scheme based on modified hierarchical time-shift multi-scale amplitude-aware permutation entropy for rolling bearing condition assessment and fault recognition

Two-stage framework with improved U-Net based on self-supervised contrastive learning for pavement crack segmentation

Code semantic enrichment for deep code search

Scalable Bayesian optimization with randomized prior networks

ViSpa (Vision Spaces): A computer-vision-based representation system for individual images and concept prototypes, with large-scale evaluation.

High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Misinformation and Public Health Messaging in the Early Stages of the Mpox Outbreak: Mapping the Twitter Narrative With Deep Learning.

A hard segmentation network guided by soft segmentation for tumor segmentation on PET/CT images

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-dimensional Vector Space Research Articles

Related Topics

Articles published on High-dimensional Vector Space

Improving information theory of context-aware phrase embeddings in HR domain

Hyperdimensional computing: a framework for stochastic computation and symbolic AI

The classification of Boolean degree 1 functions in high-dimensional finite vector spaces

A Lightweight Model Enhancing Facial Expression Recognition with Spatial Bias and Cosine-Harmony Loss

Document Similarity Using Term Frequency-Inverse Document Frequency Representation and Cosine Similarity

Random walk theory and application

Emotion detection from handwriting and drawing samples using an attention-based transformer model.

Exploiting Data Geometry in Machine Learning

Sentiment Analysis of Short Texts Using SVMs and VSMs-Based Multiclass Semantic Classification

SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search

PREDICTING 10 YEAR MORTALITY WITH MACHINE LEARNING: EVIDENCE FROM NATIONAL SOCIAL LIFE, HEALTH, AND AGING PROJECT

Identifying the Causes of Ship Collisions Accident Using Text Mining and Bayesian Networks

A novel scheme based on modified hierarchical time-shift multi-scale amplitude-aware permutation entropy for rolling bearing condition assessment and fault recognition

Two-stage framework with improved U-Net based on self-supervised contrastive learning for pavement crack segmentation

Code semantic enrichment for deep code search

Scalable Bayesian optimization with randomized prior networks

ViSpa (Vision Spaces): A computer-vision-based representation system for individual images and concept prototypes, with large-scale evaluation.

High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Misinformation and Public Health Messaging in the Early Stages of the Mpox Outbreak: Mapping the Twitter Narrative With Deep Learning.

A hard segmentation network guided by soft segmentation for tumor segmentation on PET/CT images