Abstract
Because machine learning has been widely used in various domains, interpreting internal mechanisms and predictive results of models is crucial for further applications of complex machine learning models. However, the interpretability of complex machine learning models on biased data remains a difficult problem. When the important explanatory features of concerned data are highly influenced by contaminated distributions, particularly in risk-sensitive fields, such as self-driving vehicles and healthcare, it is crucial to provide a robust interpretation of complex models for users. The interpretation of complex models is often associated with analyzing model features by measuring feature importance. Therefore, this article proposes a novel method derived from high-dimensional model representation (HDMR) to measure feature importance. The proposed method can provide robust estimation when the input features follow contaminated distributions. Moreover, the method is model-agnostic, which can enhance its ability to compare different interpretations due to its generalizability. Experimental evaluations on artificial models and machine learning models show that the proposed method is more robust than the traditional method based on HDMR.
Highlights
Machine learning has been widely used in various fields
When estimating variance-based indices, there is a large probability of producing an uncertain feature importance ranking for inputs that are unstable from sample to sample [32], which we address in this study
RiSD is used as an index to represent the Sobol method, which is a variance-based method obtained via analysis of variance (ANOVA)-high-dimensional model representation (HDMR), instead of the first-order effect index Si in (3)
Summary
Machine learning has been widely used in various fields. Practical requirements often evaluate machine learning models by their accuracy. The pursuit of predictive accuracy leads to the use of more complex predictive models. Simple and interpretable models often do not have the best performance in terms of predictive accuracy [2]. Complex machine learning models are difficult for humans to understand their internal working mechanisms and decision-making process and are commonly referred to as ‘‘black boxes’’, such as deep neural networks. Such a lack of transparency can increase severe issues and hinder further applications of machine learning. The reason why certain simple models, such as logistic regression and decision tree models, are widely used is partly attributable
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.