Abstract

The growth of news articles on the internet occurs in a short period with large amounts so necessary to be grouped into several categories for easy access. There is a method for grouping news articles, namely classification. One of the classification methods is random forest which is built on decision tree. This research discusses the application of random forest as a method of classifying news articles into six categories, these are business, entertainment, health, politics, sport, and news. The data used is Cable News Network (CNN) articles from 2011 to 2022. The data is in form of text and has large amounts so good handling is needed to avoid overfitting and underfitting. Random forest is proper to apply to the data because the algorithm works very well on large amounts of data. However, random forest has a difficult interpretation if the combination of parameters is not appropriate in the data processing. Therefore, hyperparameter optimization is needed to discover the best combination of parameters in the random forest. This research uses search cross-validation (SearchCV) method to optimize hyperparameters in the random forest by testing the combinations one by one and validating those. Then we obtain the classification of news articles into six categories with an accuracy value of 0.81 on training and 0.76 on testing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.