Abstract
Information is one of the most important thing in our lives, while humans is naturally impatient when searching for information from the internet. Users want to get the right answer instantaneously with minimal effort. News headlines can be used to categorize news types, as appropriate. The appropriate type of news can make it easier for us to choose the particular topic we want. Similarity in a title can be used to clustering news based on news title. From those reason this dataset research contain the title of online news site. TFIDF used as Document Preprocessing method, K-Means as clustering method, and elbow method used to optimize number of cluster. Purity method applied to evaluate news title clustering as internal evaluation. SSE (Sum Square Error) of each cluster are calculate and compared to optimize number of cluster in the elbow method, the result of those comparison evaluate using internal evaluation called purity, purity value is conformity between cluster and ideal cluster. From the calculation of elbow method, the most optimal number of cluster are 8 cluster, there is 0.228 point between 7cluster and 8 cluster SSE value so the elbow form are made. Purity evaluation method generates value 0.514 in the number of cluster are 8, this is the highest value and the one closest to one rather than the other number of cluster which mean the most ideal. The conclusion is the elbow method can be used to optimize number of cluster on K-Mean clustering method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.