Abstract

Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.

Highlights

  • Statistics shows that each person generates 1.7 megabytes of data in just a second. 80 - 90% of the generated data is in unstructured format [3, 6]

  • Unstructured data could be any information in text form [5], emails, social media data, mobile data as text messages and locations, any MS office documents and other [4]

  • This paper proposes an approach for conversion of unstructured data to a structured data

Read more

Summary

Introduction

For the year 2020 the data amount is up to a 64.2 zettabytes [1]. 80 - 90% of the generated data is in unstructured format [3, 6]. Since most of the produced data is in unstructured format many opportunities of the usage of the data are missed. This paper proposes an approach for conversion of unstructured data to a structured data. This extraction will enable the usage of the data for analyzing purposes and training of artificial intelligence networks. The suggested approach is to use and offers the opportunity to customize the key information which will be collected and transformed into records. The customization of the system implies extracting the only information, which is applicable for the specified context

Related Work
Approach
Python
STEP 1
B YEAR construction year construction building
STEP 2
STEP 3
STEP 4
STEP 5 Information Saving
Results
Data Extraction
Network Configurations
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call