Natural Language Processing Research Research Articles

English machine translation is a natural language processing research direction that has important scientific research value and practical value in the current artificial intelligence boom. The variability of language, the limited ability to express semantic information, and the lack of parallel corpus resources all limit the usefulness and popularity of English machine translation in practical applications. The self-attention mechanism has received a lot of attention in English machine translation tasks because of its highly parallelizable computing ability, which reduces the model’s training time and allows it to capture the semantic relevance of all words in the context. The efficiency of the self-attention mechanism, however, differs from that of recurrent neural networks because it ignores the position and structure information between context words. The English machine translation model based on the self-attention mechanism uses sine and cosine position coding to represent the absolute position information of words in order to enable the model to use position information between words. This method, on the other hand, can reflect relative distance but does not provide directionality. As a result, a new model of English machine translation is proposed, which is based on the logarithmic position representation method and the self-attention mechanism. This model retains the distance and directional information between words, as well as the efficiency of the self-attention mechanism. Experiments show that the nonstrict phrase extraction method can effectively extract phrase translation pairs from the n-best word alignment results and that the extraction constraint strategy can improve translation quality even further. Nonstrict phrase extraction methods and n-best alignment results can significantly improve the quality of translation translations when compared to traditional phrase extraction methods based on single alignment.

Read full abstract

BackgroundIdentifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.ResultsIn this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.ConclusionsThis article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Read full abstract

Natural Language Processing Research Research Articles

Related Topics

Articles published on Natural Language Processing Research

Extended ArmSpeech: Armenian Spoken Language Corpus

Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus - A Comprehensive Review

BERT-CNN: A Deep Learning Model for Detecting Emotions from Text

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

English Machine Translation Model Based on an Improved Self-Attention Technology

Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet

Natural Language Processing and Computational Linguistics

Morphological Tagging and Lemmatization in the Albanian Language

Chinese Language Word Embeddings Based on the Corpus Hanku

NATURAL LANGUAGE PROCESSING AND SENTIMENT ANALYSIS

TLBIONER: TRANSFER LEARNING BASED NAMED ENTITY RECOGNITION ON MEDICAL LITERATURE DOCUMENTS

Automatic Text Summarization of Konkani Folk Tales Using Supervised Machine Learning Algorithms and Language Independent Features

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

A survey of methods, datasets and evaluation metrics for visual question answering

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

AI Based Emotion Detection for Textual Big Data: Techniques and Contribution

Word Sense Disambiguation Method Based on Graph Model and Word Vector

A knowledge graph based question answering method for medical domain.

Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Language Processing Research Research Articles

Related Topics

Articles published on Natural Language Processing Research

Extended ArmSpeech: Armenian Spoken Language Corpus

Machine Learning Techniques for Sentiment Analysis of Code-Mixed and Switched Indian Social Media Text Corpus - A Comprehensive Review

BERT-CNN: A Deep Learning Model for Detecting Emotions from Text

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

English Machine Translation Model Based on an Improved Self-Attention Technology

Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet

Natural Language Processing and Computational Linguistics

Morphological Tagging and Lemmatization in the Albanian Language

Chinese Language Word Embeddings Based on the Corpus Hanku

NATURAL LANGUAGE PROCESSING AND SENTIMENT ANALYSIS

TLBIONER: TRANSFER LEARNING BASED NAMED ENTITY RECOGNITION ON MEDICAL LITERATURE DOCUMENTS

Automatic Text Summarization of Konkani Folk Tales Using Supervised Machine Learning Algorithms and Language Independent Features

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

A survey of methods, datasets and evaluation metrics for visual question answering

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

AI Based Emotion Detection for Textual Big Data: Techniques and Contribution

Word Sense Disambiguation Method Based on Graph Model and Word Vector

A knowledge graph based question answering method for medical domain.

Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish