Improve evaluation procedure for extensive results #190

monatis · 2023-01-12T09:45:35Z

Problem

In the current implementation we use samplers to calculate evaluation metrics on a small subset of the dataset. This can give slightly different scores due to the random state in sampling. It's always possible to seed RNGs for reproduceable results, but there might be cases where we are extremely lucky or extremely unlucky based on the chosen seed. It's still fair to compare different checkpoints with seeded evaluators, but we cannot be sure whether we overestimate or underestimate the performance of all the checkpoints.

Possible solution

Add an option to enable multiple passes over the data and report the mean and STD of all passes, or
Accept an optional QdrantClient and if it is None use Qdrant as the backend to store embeddings and retrieve from.

The text was updated successfully, but these errors were encountered:

parthkl021 · 2023-12-30T15:55:41Z

@generall is this issue solved ?
If not can I work on it

monatis added the enhancement New feature or request label Jan 12, 2023

monatis self-assigned this Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve evaluation procedure for extensive results #190

Improve evaluation procedure for extensive results #190

monatis commented Jan 12, 2023

parthkl021 commented Dec 30, 2023

Improve evaluation procedure for extensive results #190

Improve evaluation procedure for extensive results #190

Comments

monatis commented Jan 12, 2023

Problem

Possible solution

parthkl021 commented Dec 30, 2023