Efficient, robust and effective rank aggregation for massive biological datasets

Pierre Andrieu,Bryan Brancotte,Laurent Bulteau,Sarah Cohen-Boulakia,Alain Denise,Adeline Pierrot,Stéphane Vialette

doi:10.1016/j.future.2021.06.013

Abstract

Massive biological datasets are available in various sources. To answer a biological question (e.g., “which are the genes involved in a given disease?”), life scientists query and mine such datasets using various techniques. Each technique provides a list of results usually ranked by importance (e.g., a list of ranked genes). Combining the results obtained by various techniques, that is, combining ranked lists of elements into one list of elements is of paramount importance to help life scientists make the most of various results and prioritize further investigations. Rank aggregation techniques are particularly well-fitted with this context as they take in a set of rankings and provide a consensus, that is, a single ranking which is the “closest” to the input rankings. However, (i) the problem of rank aggregation is NP-hard in most cases (using an exact algorithm is currently not possible for more than a few dozens of elements) and (ii) several (possibly very different) exact solutions can be obtained. As answer to (i), many heuristics and approximation algorithms have been proposed. However, heuristics cannot guarantee how far from an exact solution the consensus ranking will be, and the approximation ratio of approximation algorithms dedicated to the problem is fairly high (not less than 3/2). No solution has yet been proposed to help true-users dealing with the problem encountered in point (ii).In this paper we present a complete system able to perform rank aggregation of massive biological datasets. Our solution is efficient as it is based on an original partitioning method making it possible to compute a high-quality consensus using an exact algorithm in a large number of cases. Our solution is robust as it clearly identifies for the user groups of elements whose relative order is the same in any optimal solution. These features provide answers to points (i) and (ii) and lie in mathematical bases offering guarantees on the computed result. Also, our solution is effective as it has been implemented into a real tool, ConquR-BioV2 is used for the life science community, and evaluated at large-scale using a very large number of datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Generation Computer Systems	Publication Date: Jun 9, 2021
Citations: 4	License type: other-oa

R Discovery Prime

R Discovery Prime

Efficient, robust and effective rank aggregation for massive biological datasets

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

Reliability-Aware and Graph-Based Approach for Rank Aggregation of Biological Data
Pierre Andrieu ... Laurent Bulteau
-
Pierre Andrieu, et. al.Pierre Andrieu ... Laurent Bulteau
01 Sep 2019
01 Sep 2019

Fuzzy Logic and Rank Aggregation for the World Wide Web
M. M. Sufyan Beg ... Nesar Ahmad
-
M. M. Sufyan Beg, et. al.M. M. Sufyan Beg ... Nesar Ahmad
01 Jan 2004
01 Jan 2004

Multi-objective Evolutionary Rank Aggregation for Recommender Systems
Samuel Oliveira ... Gisele L Pappa
-
Samuel Oliveira, et. al.Samuel Oliveira ... Gisele L Pappa
01 Jul 2018
01 Jul 2018

Rank Aggregation: Together We're Strong
Frans Schalekamp ... Anke Van Zuylen
-
Frans Schalekamp, et. al.Frans Schalekamp ... Anke Van Zuylen
03 Jan 2009
03 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient, robust and effective rank aggregation for massive biological datasets

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems