Abstract

Spreadsheets are contained critical information on various topics and were most broadly utilized in numerous spaces. There are a huge amount of spreadsheet clients everywhere in the world. Spreadsheets provide considerable flexibility for data structure organization. As well as it gives their makers an enormous level of opportunity to encode their data as it is simple to utilize and easy to store the data in a table format. Because of this flexibility, tables with very complex and hierarchical data structures could be generated. Thusly, such complexity makes table processing and reusing this data is a difficult task. Therefore, the expansion in volume and complexity of these tables has prompted the necessity to preserve this data and reuse it. As a result, this paper implemented a novel algorithm-based heuristic technique and cell classification strategy to automate relational data extraction from spreadsheet hierarchical tables and without need any programming language experience. Finally, the paper does experiments on 2 different real public datasets. The percentage of average accuracy using the proposed approach on the two datasets is 95 % and 94.2% respectively.

Highlights

  • A spreadsheet is an interactive application tool for organization charts, storage, and analysis of data

  • This paper developed an automatic approach that is accompanied by some heuristic rules and cell classification features

  • Spreadsheet table discovery is the assignment of identifying all tables on a given sheet and finding their reaches

Read more

Summary

INTRODUCTION

A spreadsheet is an interactive application tool for organization charts, storage, and analysis of data. The paper used an automatic approach basedheuristic algorithms and cell classification strategy to accurately extract relational data from hierarchal and complex spreadsheet tables. Extended to extracts implicit and relational data from complex and hierarchy structure tables from simple table structure based on: Proposed an algorithm based on heuristic rules and classification cell features for selecting complex and hierarchal section header lines and data values. This methodology provides a way to extract a more and more organized structure data from spreadsheets.

RELATED WORK
There is an empty cell in the same row
Extract more than One Table from One Datasheet
ALGORITHMS USED
RESULTS AND DISCUSSION
VIII. CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.