Annotated Training Data Research Articles

The subject area of multilingual natural language processing (NLP) is concerned with the processing of natural language data in several languages. NLP systems that can translate between languages are becoming more and more necessary as the globe gets more interconnected in order to promote understanding and communication among speakers of various languages. To be effective, communication must overcome a number of obstacles presented by multilingual NLP. Lack of language standardization, which results in major variations in the grammatical constructions, vocabulary, and writing systems used in many languages, is one of the fundamental problems. The requirement for substantial amounts of annotated data for machine learning model training presents another difficulty. The creation of high-quality annotated datasets in numerous languages is time- and money-consuming, which restricts the supply of multilingual NLP resources. The problem of creating NLP systems that can handle several languages at once is the last one. This necessitates the deployment of sophisticated algorithms that can handle and evaluate data in numerous languages while producing precise findings. Researchers and developers are working on a variety of methods to address these issues. Creating standardized formats for multilingual data representation, like Universal Dependencies, which offers a unified framework for annotating linguistic data in several languages, is one strategy. Using transfer learning techniques to transfer knowledge from high-resource languages to low-resource languages is an alternative strategy. The amount of annotated data required for training NLP models in low-resource languages can bede creased with the use of this method. Last but not least, researchers are working to create multilingual NLP models that can manage numerous languages at once. To deliver precise results across numerous languages, these models employ cutting-edge methodologies like neural machine translation and multilingual word embedding’s. Despite the fact that multilingual NLP presents a number of difficult issues, with continuing study and development, it is possible to create NLP systems that are capable of processing natural language data from several languages.

Read full abstract

Deep learning (DL) models for radiation therapy (RT) image segmentation require accurately annotated training data. Multiple organ delineation guidelines exist; however, information on the used guideline is not provided with the delineation. Extraction of training data with coherent guidelines can therefore be challenging. We present a supervised classification method for pelvis structure delineations where bowel cavity, femoral heads, bladder, and rectum data, with two guidelines, were classified. The impact on DL-based segmentation quality using mixed guideline training data was also demonstrated. Bowel cavity was manually delineated on CT images for anal cancer patients (n=170) according to guidelines Devisetty and RTOG. The DL segmentation quality from using training data with coherent or mixed guidelines was investigated. A supervised 3D squeeze-and-excite SENet-154 model was trained to classify two bowel cavity delineation guidelines. In addition, a pelvis CT dataset with manual delineations from prostate cancer patients (n=1854) was used where data with an alternative guideline for femoral heads, rectum, and bladder were generated using commercial software. The model was evaluated on internal (n=200) and external test data (n=99). By using mixed, compared to coherent, delineation guideline training data mean DICE score decreased 3% units, mean Hausdorff distance (95%) increased 5mm and mean surface distance (MSD) increased 1mm. The classification of bowel cavity test data achieved 99.8% unweighted classification accuracy, 99.9% macro average precision, 97.2% macro average recall, and 98.5% macro average F1. Corresponding metrics for the pelvis internal test data were all 99% or above and for the external pelvis test data they were 96.3%, 96.6%, 93.3%, and 94.6%. Impaired segmentation performance was observed for training data with mixed guidelines. The DL delineation classification models achieved excellent results on internal and external test data. This can facilitate automated guideline-specific data extraction while avoiding the need for consistent and correct structure labels.

Read full abstract

Annotated Training Data Research Articles

Related Topics

Articles published on Annotated Training Data

A Curriculum Learning Approach for Multi-Domain Text Classification Using Keyword Weight Ranking

Semi-Supervised Modified-UNet for Lung Infection Image Segmentation

ASPER: Answer Set Programming Enhanced Neural Network Models for Joint Entity-Relation Extraction

Multilingual NLP

Current status of artificial intelligence analysis for the treatment of pancreaticobiliary diseases using endoscopic ultrasonography and endoscopic retrograde cholangiopancreatography.

Automatic License Plate Recognition

A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension

Advancements in Named Entity Recognition using Deep Learning Techniques: A Comprehensive Study on Emerging Trends

NISNet3D: three-dimensional nuclear synthesis and instance segmentation for fluorescence microscopy images

Towards fully synthetic training of 3D indoor object detectors: Ablation study

Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI

Semisupervised Generative Adversarial Networks With Temporal Convolutions for Human Activity Recognition

SPDC: A SUPER-POINT AND POINT COMBINING BASED DUAL-SCALE CONTRASTIVE LEARNING NETWORK FOR POINT CLOUD SEMANTIC SEGMENTATION

TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining

SmartWoodID-an image collection of large end-grain surfaces to support wood identification systems.

Deep learning-based classification of organs at risk and delineation guideline in pelvic cancer radiation therapy.

Quantifying the Simulation–Reality Gap for Deep Learning-Based Drone Detection

SNER-CS: Self-training Named Entity Recognition in Computer Science

Contrastive self-supervised learning for diabetic retinopathy early detection.

Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Annotated Training Data Research Articles

Related Topics

Articles published on Annotated Training Data

A Curriculum Learning Approach for Multi-Domain Text Classification Using Keyword Weight Ranking

Semi-Supervised Modified-UNet for Lung Infection Image Segmentation

ASPER: Answer Set Programming Enhanced Neural Network Models for Joint Entity-Relation Extraction

Multilingual NLP

Current status of artificial intelligence analysis for the treatment of pancreaticobiliary diseases using endoscopic ultrasonography and endoscopic retrograde cholangiopancreatography.

Automatic License Plate Recognition

A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension

Advancements in Named Entity Recognition using Deep Learning Techniques: A Comprehensive Study on Emerging Trends

NISNet3D: three-dimensional nuclear synthesis and instance segmentation for fluorescence microscopy images

Towards fully synthetic training of 3D indoor object detectors: Ablation study

Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI

Semisupervised Generative Adversarial Networks With Temporal Convolutions for Human Activity Recognition

SPDC: A SUPER-POINT AND POINT COMBINING BASED DUAL-SCALE CONTRASTIVE LEARNING NETWORK FOR POINT CLOUD SEMANTIC SEGMENTATION

TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining

SmartWoodID-an image collection of large end-grain surfaces to support wood identification systems.

Deep learning-based classification of organs at risk and delineation guideline in pelvic cancer radiation therapy.

Quantifying the Simulation–Reality Gap for Deep Learning-Based Drone Detection

SNER-CS: Self-training Named Entity Recognition in Computer Science

Contrastive self-supervised learning for diabetic retinopathy early detection.

Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong.