escience2019 has ended
Back To Schedule
Thursday, September 26 • 11:30am - 12:00pm
Reliability-Aware and Graph-Based Approach for Rank Aggregation of Biological Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Pierre Andrieu (Université Paris-Sud, CNRS, Université Paris-Saclay), Bryan Brancotte (Institut Pasteur), Laurent Bulteau (Université Paris-Est Marne-la-Vallée, CNRS), Sarah Cohen-Boulakia (Université Paris-Sud, CNRS, Université Paris-Saclay), Alain Denise (Université Paris-Sud, CNRS, Université Paris-Saclay), Adeline Pierrot (Université Paris-Sud, CNRS, Université Paris-Saclay), and Stéphane Vialette (Université Paris-Est Marne-la-Vallée, CNRS)

Huge amounts of biological data are available in public databases and can be queried using portals with keyword queries. Ranked lists of answers are obtained by users. However, properly querying such portals remains difficult since various formulations of the same query can be considered (e.g., using synonyms of the initial keyword).

Consequently, users have to manually combine several lists of hundreds of answers into one list.
Rank aggregation techniques are particularly well-fitted to this context as they take in a set of ranked elements (rankings) and provide a consensus, that is, a single ranking which is the "closest" to the input rankings.
However, the problem of rank aggregation is NP-hard in most cases. Using an exact algorithm is currently not possible for more than a few dozens of elements. A plethora of heuristics have thus been proposed which behaviour are, by essence, difficult to anticipate: given a set of input rankings, one cannot guarantee how far from an exact solution the consensus ranking provided by an heuristic will be.

The two challenges we want to tackle in this paper are the following: (i) providing an approach based on a pre-process to decompose large data sets into smaller ones where high-quality algorithms can be run and (ii) providing information to users on the reliability of the positions of elements in the consensus ranking produced.
Our approach not only lies in mathematical bases, offering guarantees on the result computed and but it has also been implemented in a real system available to life science community and tested on various real use cases.


Pierre Andrieu

Université Paris-Sud, CNRS, Université Paris-Saclay

Thursday September 26, 2019 11:30am - 12:00pm PDT
Kon Tiki Room