Abstract
Recently, it has been shown that under a broad set of conditions, the commonly used distance functions will become unstable in high-dimensional data space; i.e., the distance to the farthest data point approaches the distance to the nearest data point of a given query point with increasing dimensionality. It has been shown that if dimensions are independently distributed, and normalized to have zero mean and unit variance, instability happens. In this paper, it is shown that the normalization condition is not necessary, but all appropriate moments must be finite. Furthermore, a new distance function, namely multiplicative distance, is introduced. It is theoretically proved that this function is stable for data with independent dimensions (with identical or nonidentical distribution). In contrast to usual distance functions which are based on the summation of distances over all dimensions (distance components), the multiplicative distance is based on the multiplication of distance components. Experimental results show the stability of the multiplicative distance for data with independent and correlated dimensions in the high-dimensional space and the superiority of the multiplicative distance over the norm distances for the high-dimensional data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.