Automated systems to identify relevant documents in product risk management

Xue Ting Wee,Yvonne Koh,Chun Wei Yap

doi:10.1186/1472-6947-12-13

Abstract

BackgroundProduct risk management involves critical assessment of the risks and benefits of health products circulating in the market. One of the important sources of safety information is the primary literature, especially for newer products which regulatory authorities have relatively little experience with. Although the primary literature provides vast and diverse information, only a small proportion of which is useful for product risk assessment work. Hence, the aim of this study is to explore the possibility of using text mining to automate the identification of useful articles, which will reduce the time taken for literature search and hence improving work efficiency. In this study, term-frequency inverse document-frequency values were computed for predictors extracted from the titles and abstracts of articles related to three tumour necrosis factors-alpha blockers. A general automated system was developed using only general predictors and was tested for its generalizability using articles related to four other drug classes. Several specific automated systems were developed using both general and specific predictors and training sets of different sizes in order to determine the minimum number of articles required for developing such systems.ResultsThe general automated system had an area under the curve value of 0.731 and was able to rank 34.6% and 46.2% of the total number of 'useful' articles among the first 10% and 20% of the articles presented to the evaluators when tested on the generalizability set. However, its use may be limited by the subjective definition of useful articles. For the specific automated system, it was found that only 20 articles were required to develop a specific automated system with a prediction performance (AUC 0.748) that was better than that of general automated system.ConclusionsSpecific automated systems can be developed rapidly and avoid problems caused by subjective definition of useful articles. Thus the efficiency of product risk management can be improved with the use of specific automated systems.

Highlights

Product risk management involves critical assessment of the risks and benefits of health products circulating in the market
The results suggested that the model developed using support vector machine (SVM) had the best prediction performance compared to the models developed using other learning algorithms
The results showed that the model developed using SVM had the best prediction performance compared to those developed using other algorithms

Summary

Introduction

Product risk management involves critical assessment of the risks and benefits of health products circulating in the market. One of the important sources of safety information is the primary literature, especially for newer products which regulatory authorities have relatively little experience with. The primary literature provides vast and diverse information, only a small proportion of which is useful for product risk assessment work. The aim of this study is to explore the possibility of using text mining to automate the identification of useful articles, which will reduce the time taken for literature search and improving work efficiency. Primary literature remains as a valuable source of drug safety information, especially for newer drugs where there is little regulatory experience with them. Only about 700 (17.5%) contain valuable information for risk assessment work It is time-consuming and inefficient to manually sieve through this large number of articles and identify those that are valuable to product risk assessment. The ability to expedite this process of useful literature identification can contribute to risk assessment efficiency

Objectives

Methods

Results

Discussion

Conclusion