Intelligent Bilingual Data Extraction and Rebuilding Using Data Mining for Big Data

Shashi Pal Singh,Ajai Kumar,Shikha Jain,Neetu Yadav,Rachna Awasthi

doi:10.1166/jctn.2020.8699

Abstract

In today’s World there exists various source of data in various formats (file formats), different structure, different types and etc. which is a hug collection of unstructured over the internet or social media. This gives rise to categorization of data as unstructured, semi structured and structured data. Data that exist in irregular manner without any particular schema are referred as unstructured data which is very difficult to process as it consists of irregularities and ambiguities. So, we are focused on Intelligent Processing Unit which converts unstructured big data into intelligent meaningful information. Intelligent text extraction is a technique that automatically identifies and extracts text from file format. The system consists of different stages which include the pre-processing, keyphase extraction techniques and transformation for the text extraction and retrieve structured data from unstructured data. The system consists multiple method/approach give better result. We are currently working in various file formats and converting the file format into DOCX which will come in the form of the un-structure Form, and then we will obtain that file in the structure form with the help of intelligent Pre-processing. The pre-process stages that triggers the unstructured data/corpus into structured data converting into meaning full. The Initial stage is the system remove the stop word, unwanted symbols noisy data and line spacing. The second stage is Data Extraction from various sources of file or types of files into proper format plain text. The then in third stage we transform the data or information from one format to another for the user to understand the data. The final step is rebuilding the file in its original format maintaining tag of the files. The large size files are divided into sub small size file to executed the parallel processing algorithms for fast processing of larger files and data. Parallel processing is a very important concept for text extraction and with its help; the big file breaks in a small file and improves the result. Extraction of data is done in Bilingual language, and represent the most relevant information contained in the document. Key-phase extraction is an important problem of data mining, Knowledge retrieval and natural speech processing. Keyword Extraction technique has been used to abstract keywords that exclusively recognize a document. Rebuilding is an important part of this project and we will use the entire concept in that file format and in the last, we need the same format which we have done in that file. This concept is being widely used but not much work of the work has been done in the area of developing many functionalities under one tool, so this makes us feel the requirement of such a tool which can easily and efficiently convert unstructured files into structured one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Intelligent Bilingual Data Extraction and Rebuilding Using Data Mining for Big Data

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Theoretical Nanoscience

Lead the way for us

Similar Papers

Forecasts of the Amount Purchase Pork Meat by Using Structured and Unstructured Big Data
Ga-Ae Ryu ... Aziz Nasridinov
Agriculture | VOL. 10
Ga-Ae Ryu, et. al.Ga-Ae Ryu ... Aziz Nasridinov
18 Jan 2020
Agriculture | VOL. 10

Natural Language Processing and the Promise of Big Data: Small Step Forward, but Many Miles to Go.
Thomas M Maddox ... Michael A Matheny
Circulation. Cardiovascular quality and outcomes | VOL. 8
Thomas M Maddox, et. al.Thomas M Maddox ... Michael A Matheny
18 Aug 2015
Circulation. Cardiovascular quality and outcomes | VOL. 8

Author response: Building bridges between cellular and molecular structural biology
Ardan Patwardhan ...
-
Ardan Patwardhan, et. al.Ardan Patwardhan ...
12 Jun 2017
12 Jun 2017

An in vitro evaluation of the accuracy of Dentaport ZX apex locator in enlarged root canals
Ak Ebrahim ... H Suda
Australian Dental Journal | VOL. 52
Ak Ebrahim, et. al.Ak Ebrahim ... H Suda
01 Sep 2007
Australian Dental Journal | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intelligent Bilingual Data Extraction and Rebuilding Using Data Mining for Big Data

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Theoretical Nanoscience