You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
train_cross-encoder_scratch.py shows how to train a cross-encoder from scratch, but the evaluation here uses https://sbert.net/datasets/msmarco-qidpidtriples.rnd-shuf.train-eval.tsv.gz which is I guess just a small train_dev dataset having 500 queries each query consisting of 1-3 positive and 500 negative examples. At the end 200 example queries each with 1-3 positive and 200 negative passages are constructed as train_dev set.
Question 1
Are MRR@10 (MS Marco Dev) scores mentioned on the https://www.sbert.net/docs/pretrained-models/msmarco-v3.html uses the same train_dev set on the mentioned training code above or do you use some other dev dataset? How can I find this actual evaluation dataset and evaluation code to reproduce your results?
More specifically, I want to see how dp your models compare to other alternatives on MS Marco but I am not sure how to formally evaluate those?. E.g which exact dev dataset to use, and do we use 200 negatives as you've done in your training code or something else?
Question 2
There are some extra eval codes on the repo namely: eval_msmarco.py and eval_cross-encoder-trec-dl.py but I guess the first code "eval_msmarco" is only for bi-encoders am I right? And the second code is for evaluation of cross-encoder on trec?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hello there I have some questions regarding MS Marco Cross Encoder Evaluation.
In the docs: https://www.sbert.net/docs/pretrained-models/msmarco-v3.html
On the training/ms_marco section: https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/ms_marco
Question 1
Are MRR@10 (MS Marco Dev) scores mentioned on the https://www.sbert.net/docs/pretrained-models/msmarco-v3.html uses the same train_dev set on the mentioned training code above or do you use some other dev dataset? How can I find this actual evaluation dataset and evaluation code to reproduce your results?
More specifically, I want to see how dp your models compare to other alternatives on MS Marco but I am not sure how to formally evaluate those?. E.g which exact dev dataset to use, and do we use 200 negatives as you've done in your training code or something else?
Question 2
There are some extra eval codes on the repo namely: eval_msmarco.py and eval_cross-encoder-trec-dl.py but I guess the first code "eval_msmarco" is only for bi-encoders am I right? And the second code is for evaluation of cross-encoder on trec?
Thanks in advance.
The text was updated successfully, but these errors were encountered: