Semantic Representation Research Articles

AbstractThe field of social network analysis has identified User Alignment (UA) as a crucial area of investigation. The objective of UA is to identify and connect user accounts across diverse social networks, even when there are no explicit interconnections. UA plays a pivotal role in synthesising coherent user profiles and delving into the intricacies of user behaviour across platforms. However, traditional approaches have encountered limitations. Singular embedding techniques have been found to fall short in fully capturing the semantic essence of user profile attributes. Furthermore, classification-based embedding methods lack definitive criteria for categorisation, thereby constraining both the efficacy and applicability of these models. This paper presents a novel unsupervised Gradient Semantic Model for User Alignment (GSMUA) for the purpose of identifying common user identities across social networks. GSMUA categorises user profile information into weak, sub, and strong gradients based on the semantic intensity of attributes. Different gradient semantic levels direct attention to literal features, semantic features, or a combination of both during feature extraction, thereby achieving a full semantic representation of user attributes. In the case of strongly semantic long texts, GSMUA employs Named Entity Recognition (ENR) technology in order to enhance the inefficient handling of such texts. Furthermore, GSMUA compensates for missing user profile attributes by utilising profile information from user neighbours, thereby reducing the negative impact of missing user profile attributes on model performance. Extensive experiments conducted on four pairs of real datasets demonstrate the superiority of our approach. In comparison to the most effective previously developed unsupervised methods, GSMUA demonstrates improvements in hit-precision ranging from 5.32 to 12.17%. When compared to supervised methods, the improvements range from 0.71 to 11.79%.

Read full abstract

BackgroundThe use of single cell/nucleus RNA sequencing (scRNA-seq) technologies that quantitively describe cell transcriptional phenotypes is revolutionizing our understanding of cell biology, leading to new insights in cell type identification, disease mechanisms, and drug development. The tremendous growth in scRNA-seq data has posed new challenges in efficiently characterizing data-driven cell types and identifying quantifiable marker genes for cell type classification. The use of machine learning and explainable artificial intelligence has emerged as an effective approach to study large-scale scRNA-seq data.MethodsNS-Forest is a random forest machine learning-based algorithm that aims to provide a scalable data-driven solution to identify minimum combinations of necessary and sufficient marker genes that capture cell type identity with maximum classification accuracy. Here, we describe the latest version, NS-Forest version 4.0 and its companion Python package (https://github.com/JCVenterInstitute/NSForest), with several enhancements to select marker gene combinations that exhibit highly selective expression patterns among closely related cell types and more efficiently perform marker gene selection for large-scale scRNA-seq data atlases with millions of cells.ResultsBy modularizing the final decision tree step, NS-Forest v4.0 can be used to compare the performance of user-defined marker genes with the NS-Forest computationally-derived marker genes based on the decision tree classifiers. To quantify how well the identified markers exhibit the desired pattern of being exclusively expressed at high levels within their target cell types, we introduce the On-Target Fraction metric that ranges from 0 to 1, with a metric of 1 assigned to markers that are only expressed within their target cell types and not in cells of any other cell types. NS-Forest v4.0 outperforms previous versions in simulation studies and on its ability to identify markers with higher On-Target Fraction values for closely related cell types in real data, and outperforms other marker gene selection approaches for cell type classification with significantly higher F-beta scores when applied to datasets from three human organs—brain, kidney, and lung.DiscussionFinally, we discuss potential use cases of the NS-Forest marker genes, including for designing spatial transcriptomics gene panels and semantic representation of cell types in biomedical ontologies, for the broad user community.

Read full abstract

Semantic Representation Research Articles

Related Topics

Articles published on Semantic Representation

A method for named entity recognition in social media texts with syntactically enhanced multiscale feature fusion.

Unveiling user identity across social media: a novel unsupervised gradient semantic model for accurate and efficient user alignment

Right is it right? Influence of the type of motor response on behavioral and electrophysiological indicators during the orthographic decision task

Activity in Occipito-Temporal Cortex Is Involved in Tool-Use Planning and Contributes to Tool-Related Semantic Neural Representations

AWA-GCN: Enhancing Chinese sentiment analysis with a novel GCN model for Triplet and Quadruplet Extraction at SIGHAN 2024 dimABSA Task

A Hybrid Semantic Representation Method Based on Fusion Conceptual Knowledge and Weighted Word Embeddings for English Texts

Cross-language facilitatory and inhibitory effects in the naming of Japanese words by Chinese-Japanese bilinguals

MAMSC: a semantic enhanced representation model for public opinion key node recognition based on multianchor mapping in semantic communities

Morpho-orthographic segmentation on visual word recognition in Brazilian Portuguese speakers

Discovery of optimal cell type classification marker genes from single cell RNA sequencing data

Semantic layout-guided diffusion model for high-fidelity image synthesis in ‘The Thousand Li of Rivers and Mountains’

Superpixel semantics representation and pre-training for vision-language tasks

Boosting multi-document summarization with hierarchical graph convolutional networks

Between compounding and phrasal derivation: Polish complex nouns in sam(o)-

Joint embedding in Hierarchical distance and semantic representation learning for link prediction

Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models

Two languages, two emotional minds in one brain: processing emotion-label and emotion-laden words by Chinese-English bilinguals

JobFormer: Skill-Aware Job Recommendation with Semantic-Enhanced Transformer

MMLmiRLocNet: miRNA Subcellular Localization Prediction based on Multi-view Multi-label Learning for Drug Design.

Distributional hypothesis as isomorphism between word-word co-occurrence and analogical parallelograms.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Semantic Representation Research Articles

Related Topics

Articles published on Semantic Representation

A method for named entity recognition in social media texts with syntactically enhanced multiscale feature fusion.

Unveiling user identity across social media: a novel unsupervised gradient semantic model for accurate and efficient user alignment

Right is it right? Influence of the type of motor response on behavioral and electrophysiological indicators during the orthographic decision task

Activity in Occipito-Temporal Cortex Is Involved in Tool-Use Planning and Contributes to Tool-Related Semantic Neural Representations

AWA-GCN: Enhancing Chinese sentiment analysis with a novel GCN model for Triplet and Quadruplet Extraction at SIGHAN 2024 dimABSA Task

A Hybrid Semantic Representation Method Based on Fusion Conceptual Knowledge and Weighted Word Embeddings for English Texts

Cross-language facilitatory and inhibitory effects in the naming of Japanese words by Chinese-Japanese bilinguals

MAMSC: a semantic enhanced representation model for public opinion key node recognition based on multianchor mapping in semantic communities

Morpho-orthographic segmentation on visual word recognition in Brazilian Portuguese speakers

Discovery of optimal cell type classification marker genes from single cell RNA sequencing data

Semantic layout-guided diffusion model for high-fidelity image synthesis in ‘The Thousand Li of Rivers and Mountains’

Superpixel semantics representation and pre-training for vision-language tasks

Boosting multi-document summarization with hierarchical graph convolutional networks

Between compounding and phrasal derivation: Polish complex nouns in sam(o)-

Joint embedding in Hierarchical distance and semantic representation learning for link prediction

Addressing Semantic Variability in Clinical Outcome Reporting Using Large Language Models

Two languages, two emotional minds in one brain: processing emotion-label and emotion-laden words by Chinese-English bilinguals

JobFormer: Skill-Aware Job Recommendation with Semantic-Enhanced Transformer

MMLmiRLocNet: miRNA Subcellular Localization Prediction based on Multi-view Multi-label Learning for Drug Design.

Distributional hypothesis as isomorphism between word-word co-occurrence and analogical parallelograms.