Automated Classification of Quality Defect Issues Relating to Substandard Medicines Using a Hybrid Machine Learning and Rule-Based Approach.

Desmond Chun Hwee Teo,Doris Sock Tin Phuah,Huilin Huang,Sreemanee Raaj Dorajoo,Maggie Siok Hwee Tan,Pei San Ang,Dorothy Hooi Myn Tan,Jalene Wang Woon Poh,Yiting Huang,Filina Meixuan Tan,Michelle Sau Yuen Ng,Suan Tian Koh,Chih Tzer Choong

doi:10.1007/s40264-023-01339-8

Abstract

Substandard medicines can lead to serious safety issues affecting public health; however, the nature of such issues can be widely heterogeneous. Health product regulators seek to prioritise critical product quality defects for review to ensure that prompt risk mitigation measures are taken. This study aims to classify the nature of issues for substandard medicines using machine learning to augment a risk-based and timely review of cases. A combined machine learning algorithm with a keyword-based model was developed to classify quality issues using text relating to substandard medicines (CISTERM). The nature of issues for product defect cases were classified based on Medical Dictionary for Regulatory Activities-Health Sciences Authority (MedDRA-HSA) lowest-level terms. Product defect cases received from January 2010 to December 2021 were used for training (n = 11,082) and for testing (n = 2771). The machine learning model achieved a good recall (precision) of 92% (96%) for 'Product adulterated and/or contains prohibited substance', 86% (90%) for 'Out of specification or out of trend test result' and 90% (91%) for 'Manufacturing non-compliance'. Post-market surveillance of substandard medicines remains a key activity for drug regulatory authorities. A combined machine learning algorithm with keyword-based model can help to prioritise the review of product quality defect issues in a timely manner.

Full Text