Chapter 6 - The XML Information Set (Infoset) and Beyond

Jim Melton,Stephen Buxton

doi:10.1016/b978-155860711-8/50007-1

Abstract

This chapter focuses on the Infoset that is an abstract representation of the information in an Extensible Markup Language (XML) document and discusses the Document Object Model (DOM), an API for accessing and manipulating XML. XML document has a sequence of tags and values set out on a page or a computer screen. When a program performs operations on XML — query, update, extract — it does not need or want to deal with bits in memory or even with tags and values. The program tends to operate on the information itself. The XML Information Set, or Infoset, is an abstract representation of the core information in an XML document. That is, the Infoset encapsulates the meaning of a document, so an XML processor need not be concerned about variations in syntax. The Infoset is extended with type information in the Post-Schema-Validation Infoset, defined by XML Schema. XQuery defines its own data model - the XQuery Data Model - based on the Infoset, with additional type information and sequences. It mentions the DOM that has its own underlying data model (the DOM Structure Model), which is similar to the Infoset. The W3C XML Information Set Recommendation (Infoset) defines the Infoset representation of a document as a set of information items. There are 11 information items, and each information item has a set of properties. A comparison between the infoset and the document and a detailed description of The XPath 1.0 data model is also presented. The Post-Schema-Validation Infoset (PSVI) was defined by the XML Schema Working Group to add type and validation information to the Infoset. The Document Object Model (DOM), though strictly speaking an API, has an implicit data model closely related to the Infoset.

Full Text