Abstract

Instant messages are abundant in today’s world. Typically, around 55 billion text messages are exchanged over these platforms every day. These represent a huge source of information from which useful knowledge can be mined. Instant messages are an accurate description of each user’s characteristics and interests. Right from waking up in the morning to hitting the bed at night, people share everything with their closed ones via an Instant messaging platform. These, therefore give profound insights about a person’s different interests and their preferences towards certain entities over others. In addition to being unstructured, these instant messages present new challenges in the form of shortenings, contractions, letter/number homophones, all of which require dedicated pre-processing steps. First, the characteristics of these instant messages are discussed. Subsequently, the approaches to deal with these challenges are reviewed. Second, this data is used to cluster people of similar interests together. These clusters have to be named by a domain expert in order to gain insights and the naming becomes challenging when the dimensions of the data points increases. This problem is dealt with in the next section where useful features are extracted from the clusters in order to output a unique legitimate name for each cluster. In addition, these steps are elaborated using an unknown dataset and the working of the algorithm is demonstrated on a known dataset for validation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.