Abstract

In the period of 2014-2019, the Indonesian government has defined infrastructure development as a priority. Related to this, various media, in both online and offline, routinely report the infrastructure development news to the public. However, it is quite difficult for people to obtain a summary of information from the internet about the infrastructure development that have been carried out by the government. This study aims to provide a brief summary about infrastructure development in Indonesia by performing topic modeling approach using Latent Dirichlet Allocation. We found that the use of bigram language model could help identify phrase from the corpus. Therefore, the keywords contained in the topics are more interpretable and acceptable. Moreover, a coherence score measurement was applied to find the best number of topics from our dataset. Based on the experiments, we obtained 40 topic models from our dataset. From those 40 topics, we inferred several topic labels, such as, oil and gas infrastructure; power plant infrastructure; information technology infrastructure and internet networks; road infrastructure; reservoir infrastructure, irrigation networks and water resources; railway infrastructure; and airport infrastructure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.