MS Marco Cross Encoder Evaluation? #2638

yildize · 2024-05-12T13:50:56Z

Hello there I have some questions regarding MS Marco Cross Encoder Evaluation.

In the docs: https://www.sbert.net/docs/pretrained-models/msmarco-v3.html

MRR@10 (MS Marco Dev) scores are noted for different models.

On the training/ms_marco section: https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/ms_marco

train_cross-encoder_scratch.py shows how to train a cross-encoder from scratch, but the evaluation here uses https://sbert.net/datasets/msmarco-qidpidtriples.rnd-shuf.train-eval.tsv.gz which is I guess just a small train_dev dataset having 500 queries each query consisting of 1-3 positive and 500 negative examples. At the end 200 example queries each with 1-3 positive and 200 negative passages are constructed as train_dev set.

Question 1
Are MRR@10 (MS Marco Dev) scores mentioned on the https://www.sbert.net/docs/pretrained-models/msmarco-v3.html uses the same train_dev set on the mentioned training code above or do you use some other dev dataset? How can I find this actual evaluation dataset and evaluation code to reproduce your results?

More specifically, I want to see how dp your models compare to other alternatives on MS Marco but I am not sure how to formally evaluate those?. E.g which exact dev dataset to use, and do we use 200 negatives as you've done in your training code or something else?

Question 2
There are some extra eval codes on the repo namely: eval_msmarco.py and eval_cross-encoder-trec-dl.py but I guess the first code "eval_msmarco" is only for bi-encoders am I right? And the second code is for evaluation of cross-encoder on trec?

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MS Marco Cross Encoder Evaluation? #2638

MS Marco Cross Encoder Evaluation? #2638

yildize commented May 12, 2024 •

edited

MS Marco Cross Encoder Evaluation? #2638

MS Marco Cross Encoder Evaluation? #2638

Comments

yildize commented May 12, 2024 • edited

yildize commented May 12, 2024 •

edited