Processing Library Research Articles

Seven machine learning algorithms and three text vectorization techniques were selected as models to solve the binary classification problem. These models were trained on textual data represented by 3980 brain CT reports from 56 inpatient medical facilities in Moscow. The study utilized three text vectorization techniques: bag of words, TF-IDF, and word2vec. The resulting data were then processed by the following machine learning algorithms: decision tree, random forest, logistic regression, nearest neighbors, support vector machines, Catboost, and XGboost. Data analysis and pre-processing were performed using NLTK (Natural Language Toolkit, version 3.6.5), libraries for character-based and statistical processing of natural language, and Scikit-learn (version 0.24.2), a library for machine learning containing tools to tackle classification challenges. MedRuBertTiny2 was taken as a BERT transformer model pre-trained on medical data. Based on the training and testing outcomes from seven machine learning algorithms, the authors selected three algorithms that yielded the highest metrics (i.e. sensitivity and specificity): CatBoost, logistic regression, and nearest neighbors. The highest metrics were achieved by the bag of words technique. These algorithms were assembled into an ensemble using the stacking technique. The sensitivity and specificity for the validation dataset separated from the original sample were 0.93 and 0.90, respectively. Next, the ensemble and the BERT model were trained on an independent dataset containing 9393 textual radiology reports also divided into training and test sets. Once the ensemble was tested on this dataset, the resulting sensitivity and specificity were 0.92 and 0.90, respectively. The BERT model tested on these data demonstrated a sensitivity of 0.97 and a specificity of 0.90. When analyzing textual reports of brain CT scans with signs of intracranial hemorrhage, the trained ensemble demonstrated high accuracy metrics. Still, manual quality control of the results is required during its application. The pre-trained BERT transformer model, additionally trained on diagnostic textual reports, demonstrated higher accuracy metrics (p<0.05). The results show promise in terms of finding specific values for both binary classification task and in-depth analysis of unstructured medical information.

Read full abstract

In the article the development of automated system for search of interested users, so called leads, in Telegram messenger environment. While Telegram is not a social network and is strongly different with its interaction mode to ay web-service like blog, image or news board or forum, then the search of motivated target audience is a complex task. It is primarily complex because no recommendation system for content or finding new channels, chats, content sources is provided, the news and posts feed does not exist like in other social media. In current paper the process of development of a tool for searching interested users, created as a Telegram-bot, which interacts with Telegram API to gather the data and with different language tools analyses messages in the chat, helping to find discussions related to required theme. Particularly, to detect users that a potentially interested in specific themes, it is required to analyze the very texts of the discussion and detect the themes, users of the current chat discuss. Specifically for this analysis natural language tools are needed, as well as the tools that allow to process discussion’s context. Bot was created in the following technologies stack: the main programming language is Python, the framework pyTelegramBotAPI is responsible for interaction with Telegram servers via API, the gathered and processed data is stored in a database based on MySQL, language processing is performed in multiple steps, in which natural language processing libraries for Python and AI particularly big language model ChatGPT are involved. The bot gathers and processes information from the chat messages and then provides a report of how many mentions in the administrator defined theme made certain chat users, these users are potential leads. This data helps to build and improve marketing models of goods and services promotion and detect the level of involvement and the degree of interest in current theme

Read full abstract

Processing Library Research Articles

Related Topics

Articles published on Processing Library

The Comparative study of Python Libraries for Natural Language Processing (NLP)

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.

An efficient algorithm for estimating gate-level power consumption in large-scale integrated circuits

PERFORMANCE EVALUATION OF PYTHON LIBRARIES FOR MULTITHREADING DATA PROCESSING

Analisis Penyebab Menurunnya Minat dan Partisipasi Generasi Muda dalam Sektor Pertanian

FDIP—A Fast Diffraction Image Processing Library for X-ray Crystallography Experiments

Boosting HPC data analysis performance with the ParSoDA-Py library

Introducing the Video In Situ Snowfall Sensor (VISSS)

YOLO-IHD: Improved Real-Time Human Detection System for Indoor Drones.

APLIKASI PERPUSTAKAAN BERBASIS WEB MENGGUNAKAN FRAMEWORK CODEIGNITER (STUDI KASUS: SMPN 3 PACET)

Library Management Advocacy Study of IAIN Fattahul Muluk Papua Library Program in Realizing Digital Library in 2023

Alignment of Unsupervised Machine Learning with Human Understanding: A Case Study of Connected Vehicle Patents

The Tradition of Henna Night in the Hadrami Arab Community in Jakarta

Postmortem Muscle Proteome Characteristics of Silver Carp (Hypophthalmichthys molitrix): Insights from Full-Length Transcriptome and Deep 4D Label-Free Proteomic.

A new method for simulation modelling of leaner remanufacturing in PaaS settings

A Method for Extracting BPMN Models from Textual Descriptions Using Natural Language Processing

Automated system of SMM lead generation in Telegram messenger

Advancement in Integrated Crop Management System for Sustainable Agriculture

Slendr: a framework for spatio-temporal population genomic simulations on geographic landscapes.

Diagnosis and Study of Mechanical Vibrations in Cargo Vehicles Using ISO 2631-1:1997.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Processing Library Research Articles

Related Topics

Articles published on Processing Library

The Comparative study of Python Libraries for Natural Language Processing (NLP)

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.

An efficient algorithm for estimating gate-level power consumption in large-scale integrated circuits

PERFORMANCE EVALUATION OF PYTHON LIBRARIES FOR MULTITHREADING DATA PROCESSING

Analisis Penyebab Menurunnya Minat dan Partisipasi Generasi Muda dalam Sektor Pertanian

FDIP—A Fast Diffraction Image Processing Library for X-ray Crystallography Experiments

Boosting HPC data analysis performance with the ParSoDA-Py library

Introducing the Video In Situ Snowfall Sensor (VISSS)

YOLO-IHD: Improved Real-Time Human Detection System for Indoor Drones.

APLIKASI PERPUSTAKAAN BERBASIS WEB MENGGUNAKAN FRAMEWORK CODEIGNITER (STUDI KASUS: SMPN 3 PACET)

Library Management Advocacy Study of IAIN Fattahul Muluk Papua Library Program in Realizing Digital Library in 2023

Alignment of Unsupervised Machine Learning with Human Understanding: A Case Study of Connected Vehicle Patents

The Tradition of Henna Night in the Hadrami Arab Community in Jakarta

Postmortem Muscle Proteome Characteristics of Silver Carp (Hypophthalmichthys molitrix): Insights from Full-Length Transcriptome and Deep 4D Label-Free Proteomic.

A new method for simulation modelling of leaner remanufacturing in PaaS settings

A Method for Extracting BPMN Models from Textual Descriptions Using Natural Language Processing

Automated system of SMM lead generation in Telegram messenger

Advancement in Integrated Crop Management System for Sustainable Agriculture

Slendr: a framework for spatio-temporal population genomic simulations on geographic landscapes.

Diagnosis and Study of Mechanical Vibrations in Cargo Vehicles Using ISO 2631-1:1997.