Abstract

Machine learning methods involving multivariate interacting effects have become mainstream in feature selection. However, the feature importance score generated by machine learning methods is not statistically interpretable, which hampers its application in practice like medical diagnosis. In this study, a framework of Algorithmic Randomness based Feature Selection (ARFS) is proposed to measure the feature importance score using the p-value which derives from the combination of algorithmic randomness test and machine learning methods. In ARFS, a machine learning algorithm, such as random forest (RF), support vector machine (SVM) and naïve Bayes classifier (NB) is used to compute the nonconformity score of each example belonging to data distribution, and then the p-value from algorithmic randomness test is obtained from nonconformity scores. ARFS evaluates the importance of each feature with the reduction of p-value on the datasets before and after random permutation of that feature, which makes it statistically interpretable. To demonstrate its efficiency, three ARFS models, i.e. ARFS-RF, ARFS-SVM and ARFS-NB were used to compare with some feature selection approaches, i.e. RF-ACC, RF-Gini, KNNpermute, SMFS, ANOVA and SNR. The results showed that ARFS-RF obtained better performances both on the synthetic and benchmark datasets. Further study on chronic gastritis dataset in Traditional Chinese Medicine (TCM) showed that the symptom sets given by ARFS-RF performs substantially better than that of TCM experts with the same size. The symptom ranking list generated by ARFS-RF can offer counselling for the physician to design, select, and interpret the symptoms in chronic gastritis diagnosis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.