Abstract

USC's Alzheimer's Therapeutic Research Institute (ATRI) was founded in 2015 with the stated mission of accelerating the development of effective therapies to combat Alzheimer's disease, and its Clinical Data Sciences Initiative (CDSI) was created to identify opportunities for research and innovation in the conduct of clinical studies. With this effort, ATRI has advanced a set of strategies to accelerate the completion of study milestones, improve participant safety, and increase data quality standards. The collection and analysis of adverse event (AE) data during the course of a clinical study plays a critical role in ensuring participant safety. The unstructured diagnosis data collected must be post-processed to facilitate downstream analysis and reporting through a manual review process and classification of adverse event to standard medical terminology code from the Medical Dictionary for Regulatory Activities (MedDRA). A well-established approach for text categorization popularly known as ‘Bag of Words’ approach was applied, and a wide range of classifiers were tested. K-Nearest Neighbors (KNN) was identified to be simple and accurate with smaller training sets. While the relatively resource intensive models like logistic regression and neural network did not display their best performance with the considered smaller training set, balancing them elevated their accuracy to that of KNN, demonstrating consistency in prediction. We show that automated coding using KNN gets ∼74% - 82% accuracy when considering predictions up to rank 5. Further the approach gains ∼12–21% accuracy over pure verbatim match approach. The predictions are associated with probability scores that lets us assess the quality of prediction, and we expect this method to considerably reduce the workload on the medical coders and allow them to focus on evaluating the predicted results ultimately improving the model's accuracy. Future work will focus on inclusion of medical lexicons and creation of synthetic data sets to rapidly improve model training and overcome the disadvantage of including smaller training sets. It is expected that as the quality and magnitude of our training data sets increase, the sophisticated methods will begin to outperform the simple methods, such as KNN, yielding an overall improvement in predictive accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.