Information Extraction from Natural Language Using Universal Networking Language

Aloke Kumar Saha,M F Mridha,Jugal K Das,Jahir Ibna Rafiq

doi:10.1007/978-981-13-6861-5_24

Abstract

Contemporaneous research has strongly indicated that most of the data on the Internet are unstructured data due to the phenomenon that during the input, processing of data and collection and storage of data by almost all the entities involved do not keep the data in a format that complies with a certain structure; this scenario has a domino effect on retrieving information should there be any inquiry. A part and parcel of semantic web area is data extraction and crucial for linking question and answer in the web. Should a question is pitched, it requires semantic analysis of data—both, structured and unstructured, map each part of the answer to the relevance of the question. Information extraction entails a crucial area of natural language processing and without the proper application of data acquisition from really large data set, for instance billions of alphanumeric words—the required data are hardly ever on the receiving end. The practical application, however, certainly needs answers that are succinct, correct and to the point; often times, the readers would skim-read through each answer as they themselves have to decide on which is more accurate to their question. This poses a unique challenge, a scenario where the question is incomplete; the answer is hidden under layers of data, and to make the query even more complex, researchers add the languages that are available. For English, a lot of researches have been conducted and due to the exceptional amount of usage among all the entities alike, English language has passed the initial issues and has been producing nearly ninety-nine percent accurate data. That is not the case for Bengali semantic analysis, and deriving meaningful information has been a challenge. This paper proposes a decisive algorithm to acquire meaningful and relevant data from unstructured data. The exactitude and efficiency of target data extraction depend on reasoning and analysis of unstructured data. Here, Universal Networking Language (UNL) has been applied to the proposed method to bring out the desired output. In this method, exceptionally large data sets that are unstructured have been categorized in prespecified relation with the help of UNL, and on these relations, every word of a sentence has been compared in binary relation. Finally, the proposed method extracts information from these binary relations.

Full Text