Abstract
Objectives: This study is aimed to develop a text preprocessing technique for mixed Bisaya and English short message service (SMS) messages. This technique is used to extract significant keywords for SMS message clustering procedure as the basis for SMS automated response on Higher Education Institution (HEI)’s enrollmentrelated inquiries. Methods/statistical analysis: In this study, a text clustering preprocessing technique is introduced and developed for mixed Bisaya and English SMS messages for Higher Education Institution (HEI) enrollment-related inquiries. The technique is a relatively new approach to extract significant keywords while addressing key challenges in morphological complexities on mixed Bisaya and English SMS messages. The method has seven (7) stages namely: tokenization, language tagging, stop-word removal, stemming, Soundex, final-tagging, and language translation. The term frequency co-occurrence clustering approach is applied to evaluate the precision and effectiveness of the text preprocessing technique. Findings: Test results revealed that the method produces a good preprocessing procedure with approximately 73%–83% accuracy rate on text processing and 87%–90% accuracy rate when text preprocessing is applied to clustering. Application/ improvements: The results of this study may assist academic institutions in maximizing the opportunity to effectively entertain more enrollment-related inquiries via SMS as an alternative communication medium to its target market. This also promotes technological advancement for the institution as it utilizes an ICTenhanced marketing approach through mobile technology. Keywords: Text Preprocessing, Text Clustering, SMS Messaging, Stemming Algorithm, Enrollment-related Inquiries.
Highlights
Document or text clustering is an unsupervised classification of text collections into distinct groups of similar documents where similarity is defined as some function on documents
To overcome the shortcomings of the preprocessing techniques for short messages, and at the same time provide a suitable approach for the Bisaya dialect, this study developed a text preprocessing technique for mixed Bisaya and English short messaging service (SMS) messages
The results of the experiment for this study show that having a database lookup as parts of speech (POS) tagger does not decrease processing time, but instead causes the processing to take longer
Summary
Document or text clustering is an unsupervised classification of text collections into distinct groups of similar documents where similarity is defined as some function on documents. A text clustering algorithm partitions a document based on their topic similarities. This means that documents which discuss the same topic are assigned to a single cluster [1]. Recent developments on the Internet and mobile technologies resulted in an overwhelming growth of multilingual documents on the web and short messaging service (SMS) messages. These documents are written in numerous different languages and on diverse topics, and organizing these documents have become a critical problem. Due to the need for methods that deal with text collections in various languages simultaneously, there is an increased demand for a robust multilingual document clustering algorithms
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.