Schema Mapping Research Articles

For a number of years, librarians have heard that natural language processing (NLP) will revolutionize information management and retrieval in health care settings. The goal of the two editors, professors at the University Montpellier 2 in France, is to compile research that librarians and health information systems administrators and developers will find useful in incorporating data management systems solutions into their organizations. This book provides relevant theoretical frameworks and empirical research findings in NLP according to linguistic granularity and presents original applications. Both editors demonstrate expertise in the field of computer science. Prince headed the French National University Council for Computer Science and leads the NLP research team at the Laboratoire d'Informatique, de Robotique et de Microelectronique de Montpelier. She specializes in NLP and cognitive science. Roche's research interests include text mining, information retrieval (IR), terminology, and NLP for schema mapping. NLP is a subfield of computer science that addresses the operation and management of texts, as input or output of computational devices. The scientific community is interested in NLP for the following uses: IR and knowledge extraction, knowledge integration to existing devices, and use and application of existing knowledge structures for IR services. Scientific literature encompasses so much knowledge that only computer-based systems can browse and filter it. A regular search engine does not have the complexity to undertake complex scientific queries. Knowledge classifications, taxonomies with hierarchical ties between knowledge items, provide ontologies at a high cost in human labor and involvement. NLP tools reorganize artificial intelligence (AI) techniques to focus on linguistic-conceptual relationships, rather than primarily textual analysis. Knowledge integration translates synonymous terms using data and text mining to complete or correct existing knowledge structures. Medical literature contains a substantial number of acronyms. In NLP, unlike mathematical originated formalisms, a concept can be addressed through a variety of words and phrases that are not exactly equivalent. Retrieving the relevant set of texts from a complex query must not only rely on words, but also grasp ideas expressed by distinct strings of words or phrases and necessitates topical classification. More than half of the scientific literature is written in non-English languages. Key BioNLP domain resources include PubMed, ontologies and thesauri (e.g., GeneOntology), and the Medical Subject Headings (MeSH) Thesaurus. The extraction process for IR and knowledge management (KM) is the same. However, IR results in raw data, and KM results in machine operable data. The two properties that define this book are emphasizing NLP as the main methodological issue and studying the interaction between NLP and its application domains of science and medicine. NLP goes far beyond the concept of word computing to include sentence level and discourse, segment, or text level. Sentence meaning results from word interactions as well as word meanings. Paragraph positions and grammar demonstrate the intentions of the human authors and are generally ignored by most computational techniques. NLP theories provide an attempt to address how text organization shows the dependence of language on its nonlinguistic environment. Such theories include discourse relations theory, discourse rhetorical structures theory, and speech acts theory. BioNLP has developed its own characteristics to process the domain language of biology and medicine. The target audience includes graduate school professors and students, NLP researchers, AI researchers, terminologists, linguists, health information systems specialists, and the BioNLP community. Librarians in academic settings serving the aforementioned may find this book useful in their collections, particularly if they serve those interested in European research, because the majority of the twenty-two chapter authors are international. The book is divided into five sections: “Works at a Lexical Level,” “Crossroads between NLP and Ontological Knowledge Management”; “Going Beyond Words and NLP Approaches Involving the Sentence Level”; “Pragmatics, Discourse Structures and Segment Level as the Last Stage in the NLP Offer to Biomedicine”; “NLP Software for IR in Biomedicine”; and “Conclusion and Perspectives.” The book begins with a chapter on text mining for biomedicine that sets the stage. Following the preface and chapters are sections on compilation of resources, about the contributors, and the index. References are predominantly from the last ten years, with some as new as 2008, though most are older. Some are needlessly cited twice. Contributors represent ongoing research from across nations and disciplines. The index does not include cross-references, which is odd considering the topic of the book. There are some typos or misspellings dispersed throughout the text. Graphs and illustrations are relevant and enhance the reader's understanding of concepts. The numerous mathematical formulas will appeal to the target audience. Prince and Roche have succeeded in compiling an original work in the field of BioNLP. There appears to be no advantage to the purchase of perpetual access, although potential certainly exists for an enhanced electronic version.

Read full abstract

Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of the modeling, recording, representation, integration, storage, and querying of provenance metadata. Our approach to provenance management seamlessly integrates the interoperability, extensibility, and inference advantages of Semantic Web technologies with the storage and querying power of an RDBMS to meet the emerging requirements of scientific workflow provenance management. In this paper, we elaborate on the design of a relational RDF store, called RDFP rov, which is optimized for scientific workflow provenance querying and management. Specifically, we propose: i) two schema mapping algorithms to map an OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) three efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable. The comparison with two popular relational RDF stores, Jena and Sesame, and two commercial native RDF stores, AllegroGraph and BigOWLIM, showed that our optimizations result in improved performance and scalability for provenance metadata management. Finally, our case study for provenance management in a real-life biological simulation workflow showed the production quality and capability of the RDFP rov system. Although presented in the context of scientific workflow provenance management, many of our proposed techniques apply to general RDF data management as well.

Read full abstract

Schema Mapping Research Articles

Related Topics

Articles published on Schema Mapping

Autonomous mapping of HL7 RIM and relational database schema

Bridging the data integration gap

Reverse data exchange

Normalization and optimization of schema mappings

Harvesting models from web 2.0 databases

Composition and inversion of schema mappings

Knowledge Representation and Information Management for Financial Risk Management: Report of a Workshop

The structure of inverses in schema mappings

Identification of Carbonic Anhydrase I Immunodominant Epitopes Recognized by Specific Autoantibodies Which Indicate an Improved Prognosis in Patients with Malignancy after Autologous Stem Cell Transplantation

MapMerge

TRAMP

Scalable data exchange with functional dependencies

Product Data Integration and Management Research Based on XML

Data exchange and schema mappings in open and closed worlds

Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration

Using Interval Cycles to Model Krumhansl's Tonal Hierarchies

Dealing with Uncertainty in Lexical Annotation

RDFP rov: A relational RDF store for querying and managing scientific workflow provenance

Research on relational-algebra-based schema mapping of data integration

An integrated translation of design data of a nuclear power plant from a specification-driven plant design system to neutral model data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Schema Mapping Research Articles

Related Topics

Articles published on Schema Mapping

Autonomous mapping of HL7 RIM and relational database schema

Bridging the data integration gap

Reverse data exchange

Normalization and optimization of schema mappings

Harvesting models from web 2.0 databases

Composition and inversion of schema mappings

Knowledge Representation and Information Management for Financial Risk Management: Report of a Workshop

The structure of inverses in schema mappings

Identification of Carbonic Anhydrase I Immunodominant Epitopes Recognized by Specific Autoantibodies Which Indicate an Improved Prognosis in Patients with Malignancy after Autologous Stem Cell Transplantation

MapMerge

TRAMP

Scalable data exchange with functional dependencies

Product Data Integration and Management Research Based on XML

Data exchange and schema mappings in open and closed worlds

Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration

Using Interval Cycles to Model Krumhansl's Tonal Hierarchies

Dealing with Uncertainty in Lexical Annotation

RDFP rov: A relational RDF store for querying and managing scientific workflow provenance

Research on relational-algebra-based schema mapping of data integration

An integrated translation of design data of a nuclear power plant from a specification-driven plant design system to neutral model data