Abstract

Machine Learning has made impressive advances in many applications akin to human cognition for discernment. However, success has been limited in the areas of relational datasets, particularly for data with low volume, imbalanced groups, and mislabeled cases, with outputs that typically lack transparency and interpretability. The difficulties arise from the subtle overlapping and entanglement of functional and statistical relations at the source level. Hence, we have developed Pattern Discovery and Disentanglement System (PDD), which is able to discover explicit patterns from the data with various sizes, imbalanced groups, and screen out anomalies. We present herein four case studies on biomedical datasets to substantiate the efficacy of PDD. It improves prediction accuracy and facilitates transparent interpretation of discovered knowledge in an explicit representation framework PDD Knowledge Base that links the sources, the patterns, and individual patients. Hence, PDD promises broad and ground-breaking applications in genomic and biomedical machine learning.

Highlights

  • Machine Learning has made impressive advances in many applications akin to human cognition for discernment

  • In Supplement 2, we provide the entire set of experimental results with more details to exemplify the efficacy of Pattern Discovery and Disentanglement System (PDD)

  • As in Analysis III, we showed the significance of anomalies detection, especially in clinical practices, and presented the capability of PDD in detecting anomalies

Read more

Summary

Result

To exemplify PDD’s data analytic capability, we employed a synthetic experiment and four analysis tasks with specific objectives using synthetic, bioinformatics and healthcare data with verifiable ground truth. This validates PDD’s ability to solve the small/imbalanced class and rare pattern problems without relying on prior knowledge. By using Heart Disease ­dataset[23] (Fig. 5b), we demonstrate how the classification results are improved if anomalies identified are removed from R before training This indicates the rectification capability of PDD on the input and throughputs of the ML process. The classification results of PDD were compared with those obtained from the Support Vector Machine (SVM) and Artificial Neural Network (ANN)[29] before and after removing all detected outliers and mislabeled entities (Fig. 5c). It is obvious that after removing outliers and mislabeled cases, all classification results obtained from different algorithms were improved by approximately 10% and over

Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call