Abstract

The distribution of data points is a key component in machine learning. In most cases, one uses min-max-normalization to obtain nodes in [0, 1] or Z-score normalization for standard normal distributed data. In this paper, we apply transformation ideas in order to design a complete orthonormal system in the L2 space of functions with the standard normal distribution as integration weight. Subsequently, we are able to apply the explainable ANOVA approximation for this basis and use Z-score transformed data in the method. We demonstrate the applicability of this procedure on the well-known forest fires dataset from the UCI machine learning repository. The attribute ranking obtained from the ANOVA approximation provides us with crucial information about which variables in the dataset are the most important for the detection of fires.

Highlights

  • The scale of our features is a key component in building models

  • Since ω is the density of the standard normal distribution, this function space is of a high relevance

  • For the approximation of functions that belong to a space Hs(Rd, ω) ⊆ L2(Rd, ω) that characterizes the smoothness s > 0 by the decay of the basis coefficients ck f, we can show upper bounds on the superposition dimension d(sp) for α ∈ [0, 1], see, e.g., [2]

Read more

Summary

INTRODUCTION

The scale of our features is a key component in building models. If we come back to our previous example, the time a customer spends in the shop would be less suitable since the values may have a wide range and we will probably have very few people with a significantly small or large time In this case, the Z-score normalization makes much more sense. This allows us to transform the half-period cosine to a complete orthonormal system on L2(Rd, ω) with the help of the following lemma. We have constructed a complete orthonormal system (φktrafo)k∈Nd0 on the weighted space L2(Rd, ω) using transformation ideas from [13] and the well-known half-period cosine basis (φkcos)k∈Nd0 on L2([0, 1]d).

INTERPRETABLE ANOVA APPROXIMATION
Approximation Procedure
Active Set
FOREST FIRE PREVENTION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.