Abstract

Identifying heterographic puns is an important branch of humor research, which has gradually developed into a new research area. This paper presents a heterographic pun identification mechanism based on feature sets in four dimensions, namely, semantic transparency, semantic relevance, phonetic expansibility, and syntax feature sets. The semantic transparency feature sets consist of the lexical item statistics and the character length; the syntax feature sets include names, capitalization, tense, part of speech, and location. Nine features of the above four dimensions are added to a binary decision tree to generate a threshold and complete a pun identification with the help of K-means clustering. Using the corpus of the SemEval2017 Task 7, the proposed method achieves satisfactory results, and its F1 value outscores the top one out of all participating teams. The experiment outlined in this paper proves that the taxonomic approach of the binary decision tree algorithm based on four dimensions is effective in identifying heterographic puns. The phonetic expansibility and the syntax feature sets are particularly effective among all other dimensions, which is consistent with our presumption that the phonetic feature plays a bigger role in identifying heterographic puns.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.