Abstract

The use of a text mining approach for full automatic taxonomy creation for content management has proven with serious limitations. The high level semantics indicating relevant association of entities among the documents are often not explored. This study introduces a feasible method that allows identifying high level semantics into text mining procedures while providing for appropriate levels of document descriptions to support access and discoverability. Due to the effectiveness of categorization and adequacy of the structure created can be better determined by humans who are familiar to the documents, qualitative inquiry rather than a purely experimental design was applied. The study collected the data and run the text mining analysis with text analysis, clustering and topic extraction. Two examples show how to develop a faceted classification structure to support digital collection access and navigation using the method. The study indicates that the text-mining method supports taxonomy creation with more efficiency and accuracy when human domain and application knowledge are captured during data collection and text mining processing. The proposed method of taxonomy creation would support the creation of new knowledge.

Highlights

  • Taxonomy or controlled word lists have been applied to online content structural design to support user browsing and search of information, leading to better user experience

  • The resulting taxonomy or controlled word lists become knowledge base for the domain represented by the digital collection or digital library content

  • The user-generated semantics provide user perspectives which could be used in content structural design in support of this unique group of users’ search and site navigation

Read more

Summary

Introduction

Taxonomy or controlled word lists have been applied to online content structural design to support user browsing and search of information, leading to better user experience. The resulting taxonomy or controlled word lists become knowledge base for the domain represented by the digital collection or digital library content. They are part of valuable resource for the online collection designed. The user-generated semantics provide user perspectives which could be used in content structural design in support of this unique group of users’ search and site navigation. This comes along with the principle of user-centred design. Using user-created descriptions summarizing or categorizing the document content can greatly increases the quality of the dataset if the context of data collection is correctly identified and human cognition captured

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call