Abstract

Topic detection is a process used to analyze words in a collection of textual data to determine the topics in the collection, how they relate to each other, and how they change from time to time. Fuzzy C-Means (FCM) and Kernel-based Fuzzy C-Means (KFCM) method are clustering method that is often used in topic detection problems. Both FCM and KFCM can group dataset into multiple clusters on a low-dimensional dataset, but fail on high-dimensional dataset. To overcome this problem, dimension reduction is carried out on the dataset before topic detection is carried out using the FCM or KFCM method. In this study, the national news account’s tweets dataset on Twitter were used for topic detection using the Randomspace-based Fuzzy C-Means (RFCM) method and Kernelized Randomspace-based Fuzzy C-Means (KRFCM) method. The RFCM and KRFCM learning methods are divided into two steps, which are reducing the dimension of the dataset into a lower-dimensional dataset using random projection and conducting the FCM learning method on the RFCM and the KFCM learning method on KRFCM. After obtaining the topics, then an evaluation is carried out by calculating the coherence value on the topics. The coherence value used in this study uses the Pointwise Mutual Information (PMI) unit. The study was conducted by comparing the average PMI values of RFCM and KRFCM with Eigenspace-based Fuzzy C-Means (EFCM) and Kernelized Eigenspace-based Fuzzy C-Means (KRFCM). The results obtained using national news account’s tweets showed that the RFCM and KRFCM methods offered faster running time for a dimensional reduction but had smaller average PMI values compared to the average PMI values generated by the EFCM and KEFCM learning methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.