Skip to content

Conversation

@q275343119
Copy link
Collaborator

@q275343119 q275343119 commented Nov 27, 2025

Preview here

@Samoed Samoed added the leaderboard issues related to the leaderboard label Nov 27, 2025
@q275343119
Copy link
Collaborator Author

@Samoed Could you take a look when you have some free time?

@Samoed
Copy link
Member

Samoed commented Nov 27, 2025

It looks good. Do you want to add your suggestion here? #3569 (comment)

@Samoed Samoed closed this Nov 27, 2025
@Samoed Samoed reopened this Nov 27, 2025
@q275343119
Copy link
Collaborator Author

It looks good. Do you want to add your suggestion here? #3569 (comment)

No problem, I'll update the PR later.

@q275343119
Copy link
Collaborator Author

Hi @Samoed ,I have update the PR,you can take a look when you have some free time.

@Samoed
Copy link
Member

Samoed commented Nov 27, 2025

Great! But is it possible to not disable selectors fully?
image

@Samoed
Copy link
Member

Samoed commented Nov 27, 2025

Ah, this is "select all". That's fine then

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the new changes looks really great (thanks!), but switching from e.g. "Scandinavian" to "Multilingual", give me a few errors:

Screenshot 2025-11-28 at 13 59 03

There is also some oddities when I select all languages or select all domains then it unselects a few tasks. Not sure why this happens. It might be that some tasks don't have any domains annotated)

@KennethEnevoldsen KennethEnevoldsen changed the title Fix Issue 3569&3616 fix: fix display for task information and improve UI for benchmark filtering Nov 28, 2025
@q275343119
Copy link
Collaborator Author

Generally the new changes looks really great (thanks!), but switching from e.g. "Scandinavian" to "Multilingual", give me a few errors:

Screenshot 2025-11-28 at 13 59 03

There is also some oddities when I select all languages or select all domains then it unselects a few tasks. Not sure why this happens. It might be that some tasks don't have any domains annotated)

Okay, I'll take a look at this and fix the problem.

@q275343119
Copy link
Collaborator Author

Fixed selection Error

I've also noticed that if I deselect a certain taskType or other selection first, and then select it again, some tasks will not be selected, such as STSB. This is probably because it doesn't belong to any taskType.

@q275343119
Copy link
Collaborator Author

I seem to know the reason, take STSB as an example:

class STSB(AbsTaskSTS):
metadata = TaskMetadata(
name="STSB",
dataset={
"path": "C-MTEB/STSB",
"revision": "0cde68302b3541bb8b3c340dc0644b0b745b3dc0",
},
description="A Chinese dataset for textual relatedness",
reference="https://aclanthology.org/2021.emnlp-main.357",
type="STS",
category="t2t",
modalities=["text"],
eval_splits=["validation", "test"],
eval_langs=["cmn-Hans"],
main_score="cosine_spearman",
date=None,
domains=[],
task_subtypes=None,
license=None,
annotations_creators=None,
dialect=None,
sample_creation=None,
bibtex_citation=r"""

It's domains is an empty list domains=[]

Then task's filter function:

def update_task_list(
benchmark_name, type_select, domain_select, lang_select, modality_select
):
if not len(lang_select):
return []
start_time = time.time()
tasks_to_keep = []
for task in mteb.get_benchmark(benchmark_name).tasks:
if task.metadata.type not in type_select:
continue
if task.metadata.domains is not None and not (
set(task.metadata.domains) & set(domain_select)
):
continue
if task.languages is not None and not (
set(task.languages) & set(lang_select)
):
continue
if task.metadata.modalities and not (
set(task.metadata.modalities) & set(modality_select)
):
continue
tasks_to_keep.append(task.metadata.name)
elapsed = time.time() - start_time
logger.debug(f"update_task_list callback: {elapsed}s")
return sorted(tasks_to_keep)

if task.metadata.domains is not None and not (
set(task.metadata.domains) & set(domain_select)
):

This condition returns True when it is domains=[] , so the corresponding task is filtered out

@q275343119
Copy link
Collaborator Author

I seem to know the reason, take STSB as an example:

class STSB(AbsTaskSTS):
metadata = TaskMetadata(
name="STSB",
dataset={
"path": "C-MTEB/STSB",
"revision": "0cde68302b3541bb8b3c340dc0644b0b745b3dc0",
},
description="A Chinese dataset for textual relatedness",
reference="https://aclanthology.org/2021.emnlp-main.357",
type="STS",
category="t2t",
modalities=["text"],
eval_splits=["validation", "test"],
eval_langs=["cmn-Hans"],
main_score="cosine_spearman",
date=None,
domains=[],
task_subtypes=None,
license=None,
annotations_creators=None,
dialect=None,
sample_creation=None,
bibtex_citation=r"""

It's domains is an empty list domains=[]

Then task's filter function:

def update_task_list(
benchmark_name, type_select, domain_select, lang_select, modality_select
):
if not len(lang_select):
return []
start_time = time.time()
tasks_to_keep = []
for task in mteb.get_benchmark(benchmark_name).tasks:
if task.metadata.type not in type_select:
continue
if task.metadata.domains is not None and not (
set(task.metadata.domains) & set(domain_select)
):
continue
if task.languages is not None and not (
set(task.languages) & set(lang_select)
):
continue
if task.metadata.modalities and not (
set(task.metadata.modalities) & set(modality_select)
):
continue
tasks_to_keep.append(task.metadata.name)
elapsed = time.time() - start_time
logger.debug(f"update_task_list callback: {elapsed}s")
return sorted(tasks_to_keep)

if task.metadata.domains is not None and not (
set(task.metadata.domains) & set(domain_select)
):

This condition returns True when it is domains=[] , so the corresponding task is filtered out

@KennethEnevoldsen @Samoed Is the above logic a normal business operation or a bug in the filtering process?

@Samoed
Copy link
Member

Samoed commented Dec 2, 2025

I think it can be changed to

if task.metadata.domains is not None and len(task.metadata.domains) > 0 and not (
    set(task.metadata.domains) & set(domain_select)
):

@q275343119
Copy link
Collaborator Author

I think it can be changed to

if task.metadata.domains is not None and len(task.metadata.domains) > 0 and not (
    set(task.metadata.domains) & set(domain_select)
):

Yes, or change it to:

        if task.metadata.domains and not (
            set(task.metadata.domains) & set(domain_select)
        ):

@Samoed
Copy link
Member

Samoed commented Dec 2, 2025

What was the source of bug with switching benchmarks #3629 (review)? I don't see much changes in f76e60a except for

benchmark_tasks.sort()
tasks_to_keep.sort()

I'm just curious

@q275343119
Copy link
Collaborator Author

What was the source of bug with switching benchmarks #3629 (review)? I don't see much changes in f76e60a except for

benchmark_tasks.sort()
tasks_to_keep.sort()

I'm just curious

First of all, I initially assigned the 'choices' and 'value' of the component's 'CheckboxGroup' at the time of initialization, but in the 'update_task_list' method only returned 'value', and did not use 'gr.update()' to update 'choices' and 'value' at the same time, so it was an error, because the new 'value' was not in the old 'choices' So I modified 'update_task_list' to make him return 'gr.update()'

But this is still a bug because 'update_task_list' uses caching, the cache doesn't seem to be gr.update() working properly, so I defined another caching method so that it returns a list and then return 'gr.update()'

@q275343119
Copy link
Collaborator Author

Leaderboard Build Tests
Hi @Samoed the test failed. What should I do to make it succeed?

@Myahr208

This comment was marked as off-topic.

@Samoed
Copy link
Member

Samoed commented Dec 3, 2025

The leaderboard test is successful, but it fails on a posthook. I tried to resturn multiple times, but it still fails on a posthook for some reason

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @q275343119!

I did a few tests and couldn't find any way to break it.

I am not sure why the leaderboard test fails (but it is only in the post-hook, so it is not on the code side). I set it to rerun (it might have been an GitHub issue)

Can I ask you to integrate v6 (I believe you can just merge #3605)?
Just to make sure that the changes it compatible with the latest version. If there are any issues here, feel free to leave it for another PR.

Otherwise I think this is all good to merge

@KennethEnevoldsen
Copy link
Contributor

The leaderboard issue seems to be a memory issue:

zstd: error 70 : Write error : cannot write block : No space left on device

@q275343119
Copy link
Collaborator Author

The leaderboard issue seems to be a memory issue:

zstd: error 70 : Write error : cannot write block : No space left on device

Yes, I saw that too. Is it because the results are constantly being added?

@KennethEnevoldsen
Copy link
Contributor

It might be that results is just getting too big.

Currently, we just git clone the entire repo, but it might be better to just make a shallow git clone

git clone {url} --depth 1

@Samoed do you seen any issues with this?

@Samoed
Copy link
Member

Samoed commented Dec 3, 2025

No, I haven't

@KennethEnevoldsen
Copy link
Contributor

Anyway, this is probably outside the scope of this PR. I have made an issue on it and will merge this in -- again great work @q275343119

@KennethEnevoldsen KennethEnevoldsen merged commit a882295 into embeddings-benchmark:main Dec 3, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

leaderboard issues related to the leaderboard

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task information contains is not displayed correctly Adding a task not in the leaderboard does not add the task

4 participants