Text Similarity Research Articles

e13609 Background: Precision oncology revolutionized cancer treatment by identifying molecular biomarkers to guide personalized care. The ever-growing body of medical literature presents a challenge for oncologists researching targeted therapies. While recent studies investigated large language models (LLMs) to streamline this process, LLM reliance on general rather than medical knowledge limits clinical relevance and trustworthiness. To address these limitations, we developed a retrieval augmented generation (RAG) system that integrates PubMed clinical studies, trial databases and oncological guidelines with LLMs to support targeted treatment recommendations. The Molecular Tumor Board (MTB) at the Center of Personalized Medicine (ZPMTUM) guided and evaluated treatment options proposed by the LLM to assess their applicability for clinical decision support. Methods: We used 10 publicly accessible fictional patient cases with 7 tumor types and 59 distinct molecular alterations. Our LLM system MEREDITH (Medical Evidence Retrieval and Data Integration for Tailored Healthcare) consists of Google's Gemini Pro, enhanced with RAG and Chain-of-Thought (CoT) prompting. To establish a benchmark, clinical experts at ZPMTUM manually annotated the cases. Informed by MTB expert feedback, we iteratively improved our LLM system from a draft system relying on PubMed-indexed data to an enhanced system, which replicated expert annotation processes by incorporating oncology guidelines, drug availability and trial databases (ClinicalTrials.gov, QuickQueck.de). ZPMTUM assessed credibility and clinical relevance of manually annotated and LLM-generated recommendations. Patient-level data on (likely) pathogenic molecular alterations and recommended treatment options were summarized using median and interquartile range (IQR). Semantic similarity between LLM and clinician responses was assessed using cosine similarity of text vector embeddings; paired t-test evaluated significance. Results: The median of (likely) pathogenic molecular alterations per patient was 2.5 (IQR: 2-3). ZPMTUM identified a median of 2 treatment options per patient (IQR: 1-3), while the enhanced LLM identified a median of 4 (IQR: 3-5). MEREDITH proposed multiple relevant treatment suggestions, including therapies based on preclinical studies, and molecular interactions, for further assessment by the MTB. ZPMTUM prioritized the most suitable clinical option. The mean semantic textual similarity of LLM responses increased significantly from 0.69 in the draft system to 0.76 in the enhanced system (p <0.001). Thus, feedback from ZPMTUM enhanced the model's ability to align its responses with clinician thought processes. Conclusions: Leveraging expert thought processes to instruct LLMs holds promise as a novel decision support tool for precision oncology.

The study was carried out within the framework of identification linguistics and translational linguistics. The article describes some methods for determining the degree of similarity of texts: shingle algorithm, Levenshtein distance, systems for detecting plagiarism. The purpose of the work is to test the software capabilities of comparing texts for similarity, establishing their identity, and checking uniqueness. In a broad sense, these tasks fall within the area of text identification. In a qualitative (manual) assessment of the similarity of texts, identifying parameters are selected and selected specifically for the text under study. The use of electronic resources is determined by the desire for objectivity of the methods used to establish the identity of texts and the objectivity of the results obtained. Software products also make it possible to establish another, quantitative, characteristic — the degree of similarity of texts to each other or the degree of originality of the text. The work used services whose tasks include 1) comparing the similarity of two texts; 2) calculation of the Levenshtein distance; 3) detection of borrowing. The research material was an excerpt from an interview with Foreign Minister Sergei Lavrov. Reverse machine translation texts served as options for comparison with the source text. Reverse machine translation as a translation product is part of artificial intelligence and a model of the process of understanding and interpreting natural language. The results of using the proposed services made it possible to arrange five reverse machine translation options from the most unique text to the text that is most identical to the invariant. The study showed that the programs generally produce similar results, which can be applicable to solving research and applied problems related to establishing the identity and difference of texts. The prospect of the study is to identify lexical parameters that make it possible to classify reverse machine translation texts as the most or least identical with respect to the invariant.

Text Similarity Research Articles

Related Topics

Articles published on Text Similarity

A BERT-GRU Model for Measuring the Similarity of Arabic Text

ИНТЕРПРЕТАЦИЯ ДРЕВНЕГО ФОЛЬКЛОРНОГО ТЕКСТА ЯКУТОВ

Analyzing user reactions using relevance between location information of tweets and news articles

Coreference Resolution Based on High-Dimensional Multi-Scale Information.

SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

CLSESSP: Contrastive learning of sentence embedding with strong semantic prototypes

Ship Anomalous Behavior Detection in Port Waterways Based on Text Similarity and Kernel Density Estimation

Large language models for precision oncology: Clinical decision support through expert-guided learning.

Patient-patient interactions visualization for drug side effects in patients’ reviews

A multifaceted architecture to Automate Essay Scoring for assessing english article writing: Integrating semantic, thematic, and linguistic representations

Few-shot intent detection with self-supervised pretraining and prototype-aware attention

Towards Building a Chatbot-Based First Aid Service in Arabic Language

MOVIE SIMILARITY FROM PLOT SUMMARIES

Textual similarity for legal precedents discovery: Assessing the performance of machine learning techniques in an administrative court

SOFTWARE CAPABILITIES FOR TEXT IDENTIFICATION: COMPARISON FOR SIMILARITY, IDENTITY ESTABLISHMENT, UNIQUENESS CHECK

Meningkatkan Deduplikasi Data melalui Kesamaan Teks dalam Pembelajaran Mesin: Pendekatan Komprehensif

Local large language models for privacy-preserving accelerated review of historic echocardiogram reports.

Research on Key Technologies for Text Similarity Calculation Based on Small Datasets

A Data-Driven Approach to Discovering Process Choreography

Development of English Composition Correction and Scoring System Based on Text Similarity Algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Text Similarity Research Articles

Related Topics

Articles published on Text Similarity

A BERT-GRU Model for Measuring the Similarity of Arabic Text

ИНТЕРПРЕТАЦИЯ ДРЕВНЕГО ФОЛЬКЛОРНОГО ТЕКСТА ЯКУТОВ

Analyzing user reactions using relevance between location information of tweets and news articles

Coreference Resolution Based on High-Dimensional Multi-Scale Information.

SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

CLSESSP: Contrastive learning of sentence embedding with strong semantic prototypes

Ship Anomalous Behavior Detection in Port Waterways Based on Text Similarity and Kernel Density Estimation

Large language models for precision oncology: Clinical decision support through expert-guided learning.

Patient-patient interactions visualization for drug side effects in patients’ reviews

A multifaceted architecture to Automate Essay Scoring for assessing english article writing: Integrating semantic, thematic, and linguistic representations

Few-shot intent detection with self-supervised pretraining and prototype-aware attention

Towards Building a Chatbot-Based First Aid Service in Arabic Language

MOVIE SIMILARITY FROM PLOT SUMMARIES

Textual similarity for legal precedents discovery: Assessing the performance of machine learning techniques in an administrative court

SOFTWARE CAPABILITIES FOR TEXT IDENTIFICATION: COMPARISON FOR SIMILARITY, IDENTITY ESTABLISHMENT, UNIQUENESS CHECK

Meningkatkan Deduplikasi Data melalui Kesamaan Teks dalam Pembelajaran Mesin: Pendekatan Komprehensif

Local large language models for privacy-preserving accelerated review of historic echocardiogram reports.

Research on Key Technologies for Text Similarity Calculation Based on Small Datasets

A Data-Driven Approach to Discovering Process Choreography

Development of English Composition Correction and Scoring System Based on Text Similarity Algorithm