Abstract

Retrieving important information from vast unstructured data is a great topic of interest. Such a large amount of web data can also be easily accessed; however, finding specific and structured information from unstructured text is itself a challenging task. Information extraction is a framework used to retrieve structured insights from text. Among extraction paradigms, event extraction is a method that provides answers to some questions (who did what, where, and when). Such answers can be used for the decision-making process. The underlying research focuses on extracting the frequency and place of happenings while incorporating the news headlines data generated by the online news channels. More precisely, events regarding crime and natural disasters such as earthquakes and their temporal frequency from online news headlines are extracted. For this purpose, data is collected through the scraping of two renowned news websites. The proposed event extraction approach used natural language processing libraries such as the NLTK toolkit, including Part of Speech Tagging (POS), Chunking, and Named Entity Recognition (NER), to effectively detect news and events from extracted headlines of news websites. Results present the frequency and place of crime and natural disaster events headlines in visual form. This research’s second objective is to check the similarity of headlines of different news channels posted daily. We calculated the cosine similarity of headlines from six months of a dataset. An average of 0.510878 similarities is found, which is 51%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call