Improving data retrieval quality: Evidence based medicine perspective.

M Kamalov,M Kasimova,A Kolbin,E Verbitskaya,J Balykina,V Dobrynin

doi:10.3233/jrs-150710

Abstract

The actively developing approach in modern medicine is the approach focused on principles of evidence-based medicine. The assessment of quality and reliability of studies is needed. However, in some cases studies corresponding to the first level of evidence may contain errors in randomized control trials (RCTs). Solution of the problem is the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system. Studies both in the fields of medicine and information retrieval are conducted for developing search engines for the MEDLINE database [1]; combined techniques for summarization and information retrieval targeted to solving problems of finding the best medication based on the levels of evidence are being developed [2]. Based on the relevance and demand for studies both in the field of medicine and information retrieval, it was decided to start the development of a search engine for the MEDLINE database search on the basis of the Saint-Petersburg State University with the support of Pavlov First Saint-Petersburg State Medical University and Tashkent Institute of Postgraduate Medical Education. Novelty and value of the proposed system are characterized by the use of ranking method of relevant abstracts. It is suggested that the system will be able to perform ranking based on studies level of evidence and to apply GRADE criteria for system evaluation. The assigned task falls within the domain of information retrieval and machine learning. Based on the results of implementation from previous work [3], in which the main goal was to cluster abstracts from MEDLINE database by subtypes of medical interventions, a set of algorithms for clustering in this study was selected: K-means, K-means ++, EM from the sklearn (http://scikit-learn.org) and WEKA (http://www.cs.waikato.ac.nz/~ml/weka/) libraries, together with the methods of Latent Semantic Analysis (LSA) [4] choosing the first 210 facts and the model "bag of words" [5] to represent clustered documents. During the process of abstracts classification, few algorithms were tested including: Complement Naive Bayes [6], Sequential Minimal Optimization (SMO) [7] and non linear SVM from the WEKA library. The first step of this study was to markup abstracts of articles from the MEDLINE by containing and not containing a medical intervention. For this purpose, based on our previous work [8] a web-crawler was modified to perform the necessary markuping. The next step was to evaluate the clustering algorithms at the markup abstracts. As a result of clustering abstracts by two groups, when applying the LSA and choosing first 210 facts, the following results were obtained:1) K-means: Purity = 0,5598, Normalized Entropy = 0.5994;2)K-means ++: Purity = 0,6743, Normalized Entropy = 0.4996;3)EM: Purity = 0,5443, Normalized Entropy = 0.6344.When applying the model "bag of words":1)K-means: Purity = 0,5134, Normalized Entropy = 0.6254;2)K-means ++: Purity = 0,5645, Normalized Entropy = 0.5299;3)EM: Purity = 0,5247, Normalized Entropy = 0.6345.Then, studies which contain medical intervention have been considered and classified by the subtypes of medical interventions. At the process of classification abstracts by subtypes of medical interventions, abstracts were presented as a "bag of words" model with the removal of stop words. 1)Complement Naive Bayes: macro F-measure = 0.6934, micro F-measure = 0.7234;2)Sequantial Minimal Optimization: macro F-measure = 0.6543, micro F-measure = 0.7042;3)Non linear SVM: macro F-measure = 0.6835, micro F-measure = 0.7642. Based on the results of computational experiments, the best results of abstract clustering by containing and not containing medical intervention were obtained using the K-Means ++ algorithm together with LSA, choosing the first 210 facts. The quality of classification abstracts by subtypes of medical interventions value for existing ones [8] has been improved using non linear SVM algorithm, with "bag of words" model and the removal of stop words. The results of clustering obtained in this study will help in grouping abstracts by levels of evidence, using the classification by subtypes of medical interventions and it will be possible to extract information from the abstracts on specific types of interventions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The International journal of risk & safety in medicine	Publication Date: Nov 27, 2015
Citations: 4	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Improving data retrieval quality: Evidence based medicine perspective.

Abstract

Talk to us

Similar Papers

More From: The International journal of risk & safety in medicine

Lead the way for us

Similar Papers

Part 15: First Aid: 2015 American Heart Association and American Red Cross Guidelines Update for First Aid.
Eunice M Singletary ... Luis F Lojero-Wheatley
Circulation | VOL. 132
Eunice M Singletary, et. al.Eunice M Singletary ... Luis F Lojero-Wheatley
14 Oct 2015
Circulation | VOL. 132

Part 6: Alternative Techniques and Ancillary Devices for Cardiopulmonary Resuscitation: 2015 American Heart Association Guidelines Update for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care.
Steven C Brooks ... Monique L Anderson
Circulation | VOL. 132
Steven C Brooks, et. al.Steven C Brooks ... Monique L Anderson
14 Oct 2015
Circulation | VOL. 132

Part 2: Evidence Evaluation and Management of Conflicts of Interest: 2015 American Heart Association Guidelines Update for Cardiopulmonary Resuscitation and Emergency Cardiovascular Care.
Laurie J Morrison ... Lana M Gent
Circulation | VOL. 132
Laurie J Morrison, et. al.Laurie J Morrison ... Lana M Gent
14 Oct 2015
Circulation | VOL. 132

Lack of Uniformity in Levels of Evidence and Recommendation Grades in Surgical Oncology Guidelines
Haejin In ... Caprice C Greenberg
World Journal of Surgery | VOL. 36
Haejin In, et. al.Haejin In ... Caprice C Greenberg
28 Apr 2012
World Journal of Surgery | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving data retrieval quality: Evidence based medicine perspective.

Abstract

Talk to us

Similar Papers

More From: The International journal of risk & safety in medicine