Abstract
Virtual screening of bioassay data can be of immense benefit to identify compounds which can assist in restricting the production of amyloid beta peptides (Aβ), observed in Alzheimer patients, by inhibiting the translation of amyloid precursor protein (APP). Machine learning classifiers can be adopted on the dataset to investigate those compounds. The ratio of the active molecules that achieve the goal of inhibiting APP, nonetheless, is minimal compared to their inactive counterparts. The imbalance between the two classes is handled by introducing cost-sensitivity to reweight the training instances depending on the misclassification cost allotted to each class. The paper shows the performance of cost-sensitive classifiers (Random Forest, Naive Bayes, and Logistic Regression classifier) to spot the minority (active) molecules from the majority (inactive) classes and shows their evaluation metrics. Sensitivity, specificity, False Negative rate, ROC area, and accuracy are evaluated while keeping the False Positive rate at 20.6%. The aim of the study is to investigate the most reliable classifier for the bioassay data and to explore the ideal misclassification cost. Random Forest classifier was the most robust model compared to Naive Bayes and Logistic Regression Classifiers. Moreover, each classifier had a different optimal misclassification cost.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.