Abstract

Multidimensional Scaling (MDS) is one of the most versatile tools used for exploratory data mining. It allows a first glimpse of possible structure in the data, which can inform the choice of analyses used. Its uses are multiple. It can give the user an idea as to the clusterability or linear separability of the data. It can help spot outliers, or can hint at the intrinsic dimensionality of the data. Moreover, it can sometimes reveal unexpected latent dimensions in the data. With all these uses, MDS is increasingly used in areas as diverse as marketing, medicine, genetics, music and linguistics. One of the strengths of MDS is that it is essentially agnostic to data type, as we can use any distance measure to create the distance matrix, which is the only required input to the MDS algorithm. In spite of this generality, we make the following claim. MDS is not (well) defined for an increasingly important data type, time series subsequences. In this work we explain why this is the case, and we propose a scalable solution. We demonstrate the utility of our ideas on several diverse real-world datasets. At the core of our approach is a novel Minimum Description Length (MDL) subsequence extraction algorithm. Beyond MDS visualization, this subsequence extraction subroutine may be a useful tool in its own right.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.