Whatever your (philosophical) stance about what science is, there is wide agreement that experimentation is at the core of the modern scientific method. Carrying out the (same) experiments more than once is central to science in terms of validation and confirmation of the findings.

Moreover, the new paradigm for science based on data-intensive computing, named the fourth paradigm, opened a completely new way of making scientific discoveries where experimentation originated by computation is getting more and more prominent. We are today facing the so-called reproducibility crisis across all the areas of science, where researchers fail to reproduce and confirm previous experimental findings.

BRAINTEASER, in developing and validating its own AI algorithms for disease progression prediction, will face these challenges by adopting an open science approach consisting of:

Organization of open and annual evaluation challenges where participants from industry and academia will experiment and compare their systems and solutions, including those developed by BRAINTEASER.

These evaluation challenges, not biased to any specific solution, will use open datasets to ensure comparability and reproducibility of the results.

Sharing of produced and collected data as linked (open) data according to FAIR (Findable, Accessible, Interoperable, Reusable) principles.

Adoption of a citizen science perspective, e.g. by involving patients and their associations to support the creation of the collection, analysis and description of research data necessary for the open evaluation challenges.

The open evaluation challenges organized by BRAINTEASER follow the workflow, shown in the figure, that guarantees their scientific soundness and thorough organization


Regards the creation of the experimental collections and consists of the acquisition and preparation of the datasets.


The participants test their systems and submit the output of their experiments.


The gathered experiments are used by the campaign organizers to create the ground-truth, typically adopting some appropriate sampling technique to select a subset of dataset for each topic. The datasets and ground-truth are then used to compute performance measures about each experiment.


Measurements are used to produce descriptive statistics and perform statistical tests in order to compare and understand the behaviour of different approaches and systems and to allow for their improvement. Then, these performance measurements and analyses serve to prepare reports about the experiments, the techniques they used and their findings.


In order to maximize the knowledge transfer and impact, the experimental data and findings are published together with their reports and made available for further exploitation and reuse; moreover, a public event is organized where the proposed approaches and their performances are presented, discussed, and compared in a live setting which facilitates the transfer of competencies and know-how among participants.

In the context of BRAINTEASER, this evaluation workflow acts as a catalyst for innovation and exploitation for three main reasons:

participants in evaluation activities are challenged to improve their own systems by addressing the evaluation tasks

the publicly available and huge experimental collections and experimental results are a durable asset that can be continuously exploited to improve systems, also outside BRAINTEASER itself;

the open access publication of the results and their analyses as well as the public event allow for a quick cross-fertilization where best-of-breed approaches can be picked up and applied to each own solutions and systems.

BRAINTEASER organizes its open evaluation challenges under the umbrella of the CLEF Initiative –

Conference and Labs of the Evaluation Forum, the internationally renowned campaign series whose main mission is to promote research, innovation, and development of information access systems with an emphasis on multilingual and multimodal information with various levels of structure.

Skip to content