Search Result Diversification Research Articles

Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST

Read full abstract

User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users' various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is ?-nDCG@5 0.307, IA-P@5 0.121, and ?#-nDCG@5 0.214 on the TREC09, as well as ?-nDCG@10 0.421, IA-P@10 0.201, and ?#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users' search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.

Read full abstract

Search Result Diversification Research Articles

Related Topics

Articles published on Search Result Diversification

Div-BLAST: diversification of sequence search results.

DIVERSIFYING SEMANTIC ENTITY SEARCH: INDEPENDENT COMPONENT ANALYSIS APPROACH

Adaptive diversification for tag-based social image retrieval

Introduction to the special issue on search intents and diversification

Mining subtopics from different aspects for diversifying search results

Diversifying Search Results through Pattern-Based Subtopic Modeling

Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Web Page Clustering

The 1st international workshop on diversity in document retrieval

Results selection diversity for web image retrieval

Evaluating subtopic retrieval methods: Clustering versus diversification of search results

A weighted-graph-based approach for diversifying search results

Towards a Relevant and Diverse Search of Social Images

Automatic Home Medical Product Recommendation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Search Result Diversification Research Articles

Related Topics

Articles published on Search Result Diversification

Div-BLAST: diversification of sequence search results.

DIVERSIFYING SEMANTIC ENTITY SEARCH: INDEPENDENT COMPONENT ANALYSIS APPROACH

Adaptive diversification for tag-based social image retrieval

Introduction to the special issue on search intents and diversification

Mining subtopics from different aspects for diversifying search results

Diversifying Search Results through Pattern-Based Subtopic Modeling

Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Web Page Clustering

The 1st international workshop on diversity in document retrieval

Results selection diversity for web image retrieval

Evaluating subtopic retrieval methods: Clustering versus diversification of search results

A weighted-graph-based approach for diversifying search results

Towards a Relevant and Diverse Search of Social Images

Automatic Home Medical Product Recommendation