Skip to content
Discussion options

You must be logged in to vote

Thanks for the question. As a part of RTEB, we have actually just started using closed datasets to estimate generalisation.

So this is indeed possible, but it requires:

  1. We can make a sample public (~5 samples), which can be anonymised
  2. We can publish descriptive statistics of the task
  3. The MTEB maintainers will have access to the dataset to ensure that benchmarks remain up to date and to allow for validation of the data samples. To control who has access, we use the "mteb-private" org on HuggingFace, which reduced the number of people with access compared to the full organization. We will not make any of these datasets public or share them in any way.

gating is also an option, but then yo…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@andrejridzik
Comment options

@KennethEnevoldsen
Comment options

Answer selected by KennethEnevoldsen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants