Abstract

Effective indexing of social media data is key to searching for information on the social Web. However, the characteristics of social media data make it a challenging task. The large-scale and streaming nature is the first challenge, which requires the indexing algorithm to be able to efficiently update the indexing structure when receiving data streams. The second challenge is utilizing the rich meta-information of social media data for a better evaluation of the similarity between data objects and for a more semantically meaningful indexing of the data, which may allow the users to search for them using the different types of queries they like. Existing approaches based on either matrix operations or hashing usually cannot perform an online update of the indexing base to encode upcoming data streams, and they have difficulty handling noisy data. This chapter presents a study on using the Online Multimodal Co-indexing Adaptive Resonance Theory (OMC-ART) for an effective and efficient indexing and retrieval of social media data. More specifically, two types of social media data are considered: (1) the weakly supervised image data, which is associated with captions, tags and descriptions given by the users; and (2) the e-commerce product data, which includes product images, titles, descriptions and user comments. These scenarios make this study related to multimodal web image indexing and retrieval. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMC-ART is able to perform online learning of sequential data. Second, instead of a plain indexing structure, OMC-ART builds a two-layer one, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions of the clusters they belong to; while in the second layer, the data objects are co-indexed by their own feature distributions. Third, OMC-ART enables flexible multimodal searching by using either visual features, keywords, or a combination of both. Fourth, OMC-ART employs a ranking algorithm that does not need to go through the whole indexing system when only a limited number of images need to be retrieved. Experiments on two publicly accessible image datasets and a real-world e-commerce dataset demonstrate the efficiency and effectiveness of OMC-ART. The content of this chapter is summarized and extended from [13] ( https://doi.org/10.1145/2671188.2749362), and the Python codes of OMC-ART with examples on building an e-commerce product search engine are available at https://github.com/Lei-Meng/OMC-ART-Build-a-toy-online-search-engine-.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call