Abstract
Edit distances provide us with an established method to capture structural features of data, and a distance between data objects represents their dissimilarity. In contrast, kernels form a category of similarity functions, and a positive definite kernel enables us to leverage abundant techniques of multivariate analysis. This paper aims to fill the gap between distances and kernels. In the literature, we have several formulas that convert a negative definite distance function into a positive definite kernel. Edit distance functions, however, are not necessarily negative definite, and our first contribution is to introduce an alternative method to derive positive definite kernels from edit distance functions that are not necessarily negative definite. The method is equipped with an easy-to-check and strong sufficient condition for positive definiteness, and the condition turns out to be tightly related with the triangle inequality. In fact, to our knowledge, all of the edit distance functions in the literature that support the triangle inequality meet the condition for positive definiteness. Secondly, we apply this method to four well-known edit distance functions for trees to introduce four novel kernels and show that three of them are positive definite. Thirdly, we develop a theory of subtree matching to study these kernels. Our kernels count matchings between subtrees of the input trees with weights determined according to individual matchings. Although the number of such matchings is an exponential function of the size of the input trees (the number of vertices), our theory enables us to develop dynamic-programming-based algorithms, whose asymptotic computational complexities fall between a quadratic function and a cubic function of the size.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.