Abstract

This paper proposes a novel XML-based system for retrieval of presentation slides to address the growing data mining needs in presentation archives for educational and scholarly settings. In particular, contextual information, such as structural and formatting features, is extracted from the open format XML representation of presentation slides. In response to a textual user query, each extracted feature is used to compute a fuzzy relevance score for each slide in the database. The fuzzy scores from the various features are then combined through a hierarchical scheme to generate a single relevance score per slide. Various fusion operators and their properties are examined with respect to their effect on retrieval performance. Experimental results indicate a significant increase in retrieval performance measured in terms of precision-recall. The improvements are attributed to both the incorporation of the contextual features and the hierarchical feature combination scheme.

Highlights

  • Retrieval tools have proven to be indispensable for searching and locating relevant information in large repositories

  • The existence of large slide presentation repositories in education and scholarly settings has necessitated the development of effective search and retrieval tools

  • This paper has examined the unique characteristics of slide presentations, as compared to traditional text and multimedia documents, and has proposed a retrieval tool geared toward such repositories

Read more

Summary

INTRODUCTION

Retrieval tools have proven to be indispensable for searching and locating relevant information in large repositories. A plethora of solutions has been proposed and successfully applied to document, image, video, and audio collections Despite this success, bridging the so-called semantic gap still remains a key challenge in developing retrieval techniques. The relative positioning of text in this structure can provide hints about the degree of relevance of each term as perceived by the author Such information can be used in combination with traditional keyword matching to improve retrieval performance [3, 4]. The proposed system uses structural and text formatting attributes, such as indentation level, font size, and typeface, to calculate a relevance score for occurrences of the query term on each slide. The proposed score combination scheme provides a flexible framework to model the subjective nature of the concept of term relevance in varying slide authoring styles.

OVERVIEW OF CONTRIBUTIONS AND RELATED WORK
RETRIEVAL FEATURES
Feature hierarchy
Word level features
Line level features
Slide level features
RELEVANCE CALCULATION
Word-level scores
Indentation
Slide-level scores
RELEVANCE AGGREGATION
Aggregation operators: overview
Conjunctive operators
Mean operators
Disjunctive operators
Aggregation of word-level scores
Aggregation of line-level scores
Aggregation of slide-level scores
Dataset
Figure of merit
Comparison to other methods
Choice of features
Choice of membership functions
Choice of aggregation operators
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call