Abstract

Expectiles have gained considerable attention in recent years due to wide applications in many areas. In this study, the k-nearest neighbours approach, together with the asymmetric least squares loss function, called ex-kNN, is proposed for computing expectiles. Firstly, the effect of various distance measures on ex-kNN in terms of test error and computational time is evaluated. It is found that Canberra, Lorentzian, and Soergel distance measures lead to minimum test error, whereas Euclidean, Canberra, and Average of (L1,L∞) lead to a low computational cost. Secondly, the performance of ex-kNN is compared with existing packages er-boost and ex-svm for computing expectiles that are based on nine real life examples. Depending on the nature of data, the ex-kNN showed two to 10 times better performance than er-boost and comparable performance with ex-svm regarding test error. Computationally, the ex-kNN is found two to five times faster than ex-svm and much faster than er-boost, particularly, in the case of high dimensional data.

Highlights

  • Accepted: 8 April 2021Given independent data Dn := ((x1, y1 ), . . . ,) that were drawn from unknown probability distribution P on X × Y, where X ⊂ Rd and Y ⊂ R, the symmetric loss functions, such as least absolute deviation loss or least squares loss lead to study the center of the conditional distribution P(Y | X = x) by estimating the conditional median med(Y | X = x) or the conditional mean E(Y | X = x), respectively

  • Because the performance of ex-kNN depends on the distance measure used to determine the neighbourhood of the query point, various distance measures are considered and their impact is evaluated in terms of test error and computational time

  • It is observed that there exists no single distance measures that can be associated with ex-kNN to achieve high performance for different kinds of datasets

Read more

Summary

Introduction

Given independent data Dn := ((x1 , y1 ), . (xn , yn )) that were drawn from unknown probability distribution P on X × Y, where X ⊂ Rd and Y ⊂ R, the symmetric loss functions, such as least absolute deviation loss or least squares loss lead to study the center of the conditional distribution P(Y | X = x) by estimating the conditional median med(Y | X = x) or the conditional mean E(Y | X = x), respectively. To investigate P(·| x) beyond the center, one well-known approach is computing quantiles that were proposed by Koenker and Bassett Jr. If P(·| x) has strictly positive Lebesgue density, the conditional τ-quantile qτ , τ ∈ (0, 1) of Y given x ∈ X is the solution of Published: 11 April 2021. Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call