Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Nadezhda Biziukova,Sergey Ivanov,Vladimir Poroikov,Olga Tarasova

doi:10.3389/fgene.2020.618862

Abstract

Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

Highlights

Scientific publications represent the main source of knowledge for researchers in different fields of biology and medicine
We built the models for chemical and protein/gene Named Entity Recognition (NER) based on CHEMDNER and ChemProt corpora, respectively, and calculated their accuracy using five-fold cross-validation
We evaluated the best way of NER using the features of text developed

Summary

Introduction

Scientific publications represent the main source of knowledge for researchers in different fields of biology and medicine. Identification of associations between NEs in the texts of scientific publications includes two steps: (i) extraction of named entities from the texts, and (ii) recognition of associations. This is the focus of the Named Entity Recognition (NER) methods. There are two main groups of approaches used for NER: (i) based on rules and dictionaries and (ii) based on machine learning methods. The main disadvantage of rule and dictionary-based algorithms is the inability to extract information about entities not included in dictionaries. Another drawback is the requirements for the allocation of memory for storing dictionaries

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Dec 22, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Biomedical named entity recognition using deep neural networks with contextual information
Hyejin Cho ... Hyunju Lee
BMC bioinformatics | VOL. 20
Hyejin Cho, et. al.Hyejin Cho ... Hyunju Lee
01 Dec 2019
BMC bioinformatics | VOL. 20

Changing roles and responses of health care workers in HIV treatment and care
Divya Rajaraman ... Natasha Palmer
Tropical Medicine and International Health | VOL. 13
Divya Rajaraman, et. al.Divya Rajaraman ... Natasha Palmer
01 Nov 2008
Tropical Medicine and International Health | VOL. 13

Kidney Transplantation in Patients With HIV Infection
Peter P Reese ... Roy D Bloom
Advances in renal replacement therapy | VOL. 17
Peter P Reese, et. al.Peter P Reese ... Roy D Bloom
10 Dec 2009
Advances in renal replacement therapy | VOL. 17

Extending hybrid Conditional Random Fields approach of Named Entity Recognition for Marathi tweets
Maithilee L Patawar ... M A Potey
-
Maithilee L Patawar, et. al.Maithilee L Patawar ... M A Potey
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics