Can a private/gated dataset be added to a leaderboard? #3346

KennethEnevoldsen · 2025-10-13T14:35:37Z

KennethEnevoldsen
Oct 13, 2025
Maintainer

Hi guys,

We are currently working on putting together a new benchmark, but some of the underlying datasets can't be made fully public. I would like to know if all datasets need to be made public in order to see the results of corresponding tasks on the leaderboard. Could it work if a gated access is enabled for the dataset?

Thanks a lot for help

Originally asked here

Answered by KennethEnevoldsen

Oct 13, 2025

Thanks for the question. As a part of RTEB, we have actually just started using closed datasets to estimate generalisation.

So this is indeed possible, but it requires:

We can make a sample public (~5 samples), which can be anonymised
We can publish descriptive statistics of the task
The MTEB maintainers will have access to the dataset to ensure that benchmarks remain up to date and to allow for validation of the data samples. To control who has access, we use the "mteb-private" org on HuggingFace, which reduced the number of people with access compared to the full organization. We will not make any of these datasets public or share them in any way.

gating is also an option, but then yo…

View full answer

KennethEnevoldsen · 2025-10-13T14:46:04Z

KennethEnevoldsen
Oct 13, 2025
Maintainer Author

Thanks for the question. As a part of RTEB, we have actually just started using closed datasets to estimate generalisation.

So this is indeed possible, but it requires:

We can make a sample public (~5 samples), which can be anonymised
We can publish descriptive statistics of the task
The MTEB maintainers will have access to the dataset to ensure that benchmarks remain up to date and to allow for validation of the data samples. To control who has access, we use the "mteb-private" org on HuggingFace, which reduced the number of people with access compared to the full organization. We will not make any of these datasets public or share them in any way.

gating is also an option, but then you need to describe the gating policy

2 replies

andrejridzik Oct 13, 2025

Thanks a lot for the answer!

KennethEnevoldsen Oct 13, 2025
Maintainer Author

No problem, hoping to see more private datasets to prevent overfitting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can a private/gated dataset be added to a leaderboard? #3346

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can a private/gated dataset be added to a leaderboard? #3346

Uh oh!

KennethEnevoldsen Oct 13, 2025 Maintainer

Replies: 1 comment · 2 replies

Uh oh!

KennethEnevoldsen Oct 13, 2025 Maintainer Author

Uh oh!

andrejridzik Oct 13, 2025

Uh oh!

KennethEnevoldsen Oct 13, 2025 Maintainer Author

KennethEnevoldsen
Oct 13, 2025
Maintainer

Replies: 1 comment 2 replies

KennethEnevoldsen
Oct 13, 2025
Maintainer Author

KennethEnevoldsen Oct 13, 2025
Maintainer Author