Abstract

Identification of similar objects from a large collection of objects is one funda- mental technique in several different areas in computer science, e.g., the case- based reasoning and the machine discovery. Strings are the most basic represen- tations of objects inside computers, and thus string similarity is one of the most important topics in computer science.Similarity measure must be sensitive to the kind of differences we wish to quantify. The weighted edit distance is one such framework in which the measure can be varied by altering weight assignment to each edit operation depending on symbols involved. However, it does not suffice to solve ‘real problems’ (see e.g., [2]). It is considered that two objects have necessarily a common structure if they seem similar, and the degree of similarity depends upon how valuable the common structure is. Based on this intuition, we present a unifying framework, named string resemblance system (SRS, for short). In this framework, similarity of two strings can be viewed as the maximum score of pattern that matches both of them. The differences among the measures are therefore the choices of (1) pattern set to which common patterns belong, and (2) pattern score function which assigns a score to each pattern.For example, if we choose the set of patterns with variable length don’t cares and define the score of a pattern to be the number of symbols in it, then the obtained measure is the length of the longest common subsequence (LCS) of two strings. In fact, the strings acdeba and abdac have a common pattern a⋆d⋆a⋆ which contains three symbols. With this framework one can easily design and modify his/her measures. In this paper we briefly describe SRSs and then report successful results of applications to literature and music.KeywordsCommon PatternUnify FrameworkEdit OperationLonge Common SubsequenceEmpty StringThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.