Abstract
Taxonomies are crucial for executing Business Intelligence (BI) applications by preventing users from being overwhelmed with information. The business applications require knowledge of the key concepts and their organization to perform the classification of news articles. To ensure and maintain reliable information classification quality, it is crucial to keep the same definitions and organization of these concepts. This indicates the major significance of BI taxonomies in organizations. However, their development in business information systems follows an ad hoc process in most cases. Compared to many other domains, e.g. environmental and life sciences research, no mature and updated BI taxonomies are available in the literature. Existing studies cover BI taxonomies, but these are excessively generic and domain-specific. As a result, the BI domain suffers from many immature, incorrect, and incomplete notions of concepts. New BI-related concepts emerge rapidly, making it essential to include them in existing taxonomies during the enrichment process. The contribution of our research is the exploration of the possibilities of taxonomy enrichment using existing datasets. The expansion of the existing business taxonomy using multifaceted data sources to capture new concepts comprising 1) lexical datasets, 2) pre-trained word embeddings, 3) linked open data vocabulary, and 4) corpus-based relevant thematic extraction of features from news articles using Natural Language Processing (NLP) techniques. The highest semantic enrichment rate of a taxonomy got on a combination of these 4 methods. Eventually, enriched business taxonomy will contribute to the improved classification of news articles.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have