LLMJudge Logo SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval

1University College London, 2University of Sheffield, 3Microsoft
LLMJudge Logo LLMJudge Logo LLMJudge Logo LLMJudge Logo

SynDL: Synthetic Test Collection for Passage Retrieval

An image to be added.

Summary

Large-scale test collections play a crucial role in Information Retrieval (IR) research. However, according to the Cranfield paradigm and the research into publicly available datasets, the existing information retrieval research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments — a time-intensive and expensive process. Recent studies have shown the strong capability of Large Language Models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost. In this paper, to address the missing large-scale ad-hoc document retrieval dataset, we extend the TREC Deep Learning Track (DL) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Specifically, such a test collection includes more than 1,900 test queries from the previous years of tracks. We compare system evaluation with past human labels from past years and find that our synthetically created large-scale test collection can lead to highly correlated system rankings.

Relevance Judgment Prompt

prompt

Results and Analysis


Main Results

results

DL 2019

analysis results

DL 2020

analysis results

DL 2021

analysis results

DL 2022

analysis results

DL 2023

analysis results

Bias Analysis

analysis results

BibTeX

@article{rahmani2024SynDL,
      author    = {Rahmani, Hossein A. and Wang, Xi and Yilmaz, Emine and Craswell, Nick and Mitra, Bhaskar and Thomas, Paul},
      title     = {SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval},
      year      = {2024},
     journal    = {#},
     url        = {#}
    }