XML Documents Research Articles

The article discusses the use of situation-oriented approach to software processing word-documents. The documents under consideration are prepared by the user in the environment of the Microsoft Word processor or its analogs and are used in the future as data sources. The openness of the Office Open XML and Open Document Format made it possible to apply the concept of virtual documents mapped to ZIP archives for programmatic access to XML components of word documents in a situational environment. The importance of developing preliminary agreements regarding the placement of information in the document for subsequent search and retrieval, for example, using pre-prepared templates, is substantiated. For the DOCX and ODT formats, the article discusses the use of key phrases, bookmarks, content controls, custom XML components to organize the extraction of entered data. For each option, tree-like models of access to the extracted data, as well as the corresponding XPath expressions, are built. It is noted that the use of one or another option depends on the functionality and limitations of the word processor and is characterized by varying complexity of developing a blank template, entering data by the user and programming data extraction. The applied solution is based on entering metadata into the article using content controls placed in a stub template and bound to elements of a custom XML component. The developed hierarchical situational model of HSM provides extraction of an XML component, loading it into a DOM object and XSLT transformations to obtain the resulting data: an error report and JavaScript code for subsequent use of the extracted metadata.

Read full abstract

The distributed nature of the Web, as a decentralized system exchanging information between heterogeneous sources, has underlined the need to manage interoperability, i.e., the ability to automatically interpret information in Web documents exchanged between different sources, necessary for efficient information management and search applications. In this context, XML was introduced as a data representation standard that simplifies the tasks of interoperation and integration among heterogeneous data sources, allowing to represent data in (semi-) structured documents consisting of hierarchically nested elements and atomic attributes. However, while XML was shown most effective in exchanging data, i.e., in syntactic interoperability, it has been proven limited when it comes to handling semantics, i.e., semantic interoperability, since it only specifies the syntactic and structural properties of the data without any further semantic meaning. As a result, XML semantic-aware processing has become a motivating challenge in Web data management, requiring dedicated semantic analysis and disambiguation methods to assign well-defined meaning to XML elements and attributes. In this context, most existing approaches: (i) ignore the problem of identifying ambiguous XML elements/nodes, (ii) only partially consider their structural relationships/context, (iii) use syntactic information in processing XML data regardless of the semantics involved, and (iv) are static in adopting fixed disambiguation constraints thus limiting user involvement. In this paper, we provide a new XML Semantic Disambiguation Framework titled XSDF designed to address each of the above limitations, taking as input: an XML document, and then producing as output a semantically augmented XML tree made of unambiguous semantic concepts extracted from a reference machine-readable semantic network. SDF consists of four main modules for: (i) linguistic pre-processing of simple/compound XML node labels and values, (ii) selecting ambiguous XML nodes as targets for disambiguation, (iii) representing target nodes as special sphere neighborhood vectors including all XML structural relationships within a (user-chosen) range, and (iv) running context vectors through a hybrid disambiguation process, combining two approaches: concept-based and context-based disambiguation, allowing the user to tune disambiguation parameters following her needs. Conducted experiments demonstrate the effectiveness and efficiency of our approach in comparison with alternative methods. We also discuss some practical applications of our method, ranging over semantic-aware query rewriting, semantic document clustering and classification, Mobile and Web services search and discovery, as well as blog analysis and event detection in social networks and tweets.

Read full abstract

XML Documents Research Articles

Related Topics

Articles published on XML Documents

Structural Information Retrieval in XML Documents: A Graph-based Approach

XAPP: An Implementation of SAX-Based Method for Mapping XML Document to and from a Relational Database

Программное извлечение данных из word-документов на основе ситуационно-ориентированного подхода

Research on Ocean Government Data Extraction and Clustering Based on XML Document Similarity Technology

A Narrative Review of Storing and Querying XML Documents Using Relational Database

Optimization Algorithms Study and Implementation on Graph Drawing Based on XML Document

Exploiting Links to Improve Search in XML Documents

CBSL − A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

Direct Update of XML Documents with Data Values Compressed by Tree Grammars

Mapping of XML Document and Relational Database (Using Structural Queries)

Investigation of Mining Association Rules on XML Document

An Efficient prefix based labeling scheme for Dynamic update of XML Documents

Evaluating Queries and Updates on Big XML Documents

Research on the Similarity of Fuzzy XML Documents and Fuzzy DTD

Relevant XML Documents - Approach Based on Vectors and Weight Calculation of Terms

A Systematic Approach for Changing XML Namespaces in XML Schemas and Managing their Effects on Associated XML Documents under Schema Versioning

Building Semantic Trees from XML Documents

BIM and Thermographic Sensing: Reflecting the As-is Building Condition in Energy Analysis

A Comparative Study: Change Detection and Querying Dynamic XML Documents

Clustering XML Documents using Structure and Content Based in a Proposal Similarity Function (OverallSimSUX)

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

XML Documents Research Articles

Related Topics

Articles published on XML Documents

Structural Information Retrieval in XML Documents: A Graph-based Approach

XAPP: An Implementation of SAX-Based Method for Mapping XML Document to and from a Relational Database

Программное извлечение данных из word-документов на основе ситуационно-ориентированного подхода

Research on Ocean Government Data Extraction and Clustering Based on XML Document Similarity Technology

A Narrative Review of Storing and Querying XML Documents Using Relational Database

Optimization Algorithms Study and Implementation on Graph Drawing Based on XML Document

Exploiting Links to Improve Search in XML Documents

CBSL − A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

Direct Update of XML Documents with Data Values Compressed by Tree Grammars

Mapping of XML Document and Relational Database (Using Structural Queries)

Investigation of Mining Association Rules on XML Document

An Efficient prefix based labeling scheme for Dynamic update of XML Documents

Evaluating Queries and Updates on Big XML Documents

Research on the Similarity of Fuzzy XML Documents and Fuzzy DTD

Relevant XML Documents - Approach Based on Vectors and Weight Calculation of Terms

A Systematic Approach for Changing XML Namespaces in XML Schemas and Managing their Effects on Associated XML Documents under Schema Versioning

Building Semantic Trees from XML Documents

BIM and Thermographic Sensing: Reflecting the As-is Building Condition in Energy Analysis

A Comparative Study: Change Detection and Querying Dynamic XML Documents

Clustering XML Documents using Structure and Content Based in a Proposal Similarity Function (OverallSimSUX)