Abstract

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.

Highlights

  • Today, data become the lifeblood of businesses, as different database applications, such as Decision Support Systems, Customer Relationship Management, Data Warehouses, Web Services, and eLearning Systems are being used; beneficial information and knowledge can be gained from considerable amounts of data

  • Investigations demonstrate that heaps of such applications fail to run successfully and efficiently, due to many issues, such as poor system design or weak query performance, yet nothing is sure to cause application failure than carelessness of data quality issues [1]

  • Grijzenhout & Marx, provide in-depth analysis to answer the question “Is the quality of XML documents found on the web sufficient to apply XML technologies like XQuery, XPath, and XSLT?” The results show that on the web, 58% of the existing documents over the web are of XML file format, one-third of these documents accompanying valid XML Schema Definition (XSD) or Document Type Definition (DTD)

Read more

Summary

INTRODUCTION

Data become the lifeblood of businesses, as different database applications, such as Decision Support Systems, Customer Relationship Management, Data Warehouses, Web Services, and eLearning Systems are being used; beneficial information and knowledge can be gained from considerable amounts of data. About 44% of businesses and organizations reported that missing or imperfect data is the most frequent problem alongside obsolete information [2]. Extensible Markup Language (XML) stands out rapidly amongst essential data file formats. It has been used for scientific data such as DNA sequences, to annotate extensive documents such as DrugBank database, or for exchanging data over the Web for e-commerce benefits.

Motivational Example
LITERATURE REVIEW
XML Conditional Dependencies
Discovering Patterns Tableaus for Conditional Dependencies
Discovering XCFD Pattern Tableaus
Discovering XCIND Pattern Tableaus
Pattern Tableaus Table
IMPLEMENTATION AND DISCUSSION
Findings
CONCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call