An Efficient Method for Noisy Annotation Data Modeling

Sushama Shinde,Shyam Gupta

doi:10.9790/0661-16512024

Abstract

Probabilistic topic models are used for analyzing and extracting content-related annotations from noisy annotated discrete data like WebPages on WWW and these WebPages are stored using social bookmarking services with the help of social bookmarking services, reason behind this process most of time users can attach annotations freely, some annotations do not describe the semantics of the content, therefore they are noisy, simply they are not content related. The extraction of content-related annotations can be used as a prepossessing step in machine learning. Prepossessing step in machine learning is like text classification and image recognition, and can improve information retrieval performance. The proposed model is a generative model for content and annotations, where annotations are assumed to be originated either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method with the help of synthetic data and real social annotation data for text and images.

Full Text