Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve evaluation procedure for extensive results #190

Open
monatis opened this issue Jan 12, 2023 · 1 comment
Open

Improve evaluation procedure for extensive results #190

monatis opened this issue Jan 12, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@monatis
Copy link
Contributor

monatis commented Jan 12, 2023

Problem

In the current implementation we use samplers to calculate evaluation metrics on a small subset of the dataset. This can give slightly different scores due to the random state in sampling. It's always possible to seed RNGs for reproduceable results, but there might be cases where we are extremely lucky or extremely unlucky based on the chosen seed. It's still fair to compare different checkpoints with seeded evaluators, but we cannot be sure whether we overestimate or underestimate the performance of all the checkpoints.

Possible solution

  1. Add an option to enable multiple passes over the data and report the mean and STD of all passes, or
  2. Accept an optional QdrantClient and if it is None use Qdrant as the backend to store embeddings and retrieve from.
@monatis monatis added the enhancement New feature or request label Jan 12, 2023
@monatis monatis self-assigned this Jan 12, 2023
@parthkl021
Copy link
Contributor

@generall is this issue solved ?
If not can I work on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants