fix: fix display for task information and improve UI for benchmark filtering #3629

q275343119 · 2025-11-27T01:07:12Z

Fix Adding a task not in the leaderboard does not add the task #3569, which limited the selection range of the task dropdown.
Fix Task information contains is not displayed correctly #3616, resolving the display issue of Task information .

Preview here

fixes embeddings-benchmark#3601

q275343119 · 2025-11-27T09:10:43Z

@Samoed Could you take a look when you have some free time?

Samoed · 2025-11-27T10:06:54Z

It looks good. Do you want to add your suggestion here? #3569 (comment)

q275343119 · 2025-11-27T10:54:40Z

It looks good. Do you want to add your suggestion here? #3569 (comment)

No problem, I'll update the PR later.

q275343119 · 2025-11-27T14:14:50Z

Hi @Samoed ,I have update the PR,you can take a look when you have some free time.

Samoed · 2025-11-27T14:20:54Z

Great! But is it possible to not disable selectors fully?

Samoed · 2025-11-27T14:26:23Z

Ah, this is "select all". That's fine then

KennethEnevoldsen

Generally the new changes looks really great (thanks!), but switching from e.g. "Scandinavian" to "Multilingual", give me a few errors:

There is also some oddities when I select all languages or select all domains then it unselects a few tasks. Not sure why this happens. It might be that some tasks don't have any domains annotated)

q275343119 · 2025-11-29T09:08:40Z

Generally the new changes looks really great (thanks!), but switching from e.g. "Scandinavian" to "Multilingual", give me a few errors:

There is also some oddities when I select all languages or select all domains then it unselects a few tasks. Not sure why this happens. It might be that some tasks don't have any domains annotated)

Okay, I'll take a look at this and fix the problem.

q275343119 · 2025-12-02T01:04:49Z

Fixed selection Error

I've also noticed that if I deselect a certain taskType or other selection first, and then select it again, some tasks will not be selected, such as STSB. This is probably because it doesn't belong to any taskType.

q275343119 · 2025-12-02T01:34:24Z

I seem to know the reason, take STSB as an example:

mteb/mteb/tasks/sts/zho/cmtebsts.py

Lines 171 to 193 in 072e6ef

    
           class STSB(AbsTaskSTS): 
        
               metadata = TaskMetadata( 
        
                   name="STSB", 
        
                   dataset={ 
        
                       "path": "C-MTEB/STSB", 
        
                       "revision": "0cde68302b3541bb8b3c340dc0644b0b745b3dc0", 
        
                   }, 
        
                   description="A Chinese dataset for textual relatedness", 
        
                   reference="https://aclanthology.org/2021.emnlp-main.357", 
        
                   type="STS", 
        
                   category="t2t", 
        
                   modalities=["text"], 
        
                   eval_splits=["validation", "test"], 
        
                   eval_langs=["cmn-Hans"], 
        
                   main_score="cosine_spearman", 
        
                   date=None, 
        
                   domains=[], 
        
                   task_subtypes=None, 
        
                   license=None, 
        
                   annotations_creators=None, 
        
                   dialect=None, 
        
                   sample_creation=None, 
        
                   bibtex_citation=r"""

It's domains is an empty list domains=[]

Then task's filter function：

mteb/mteb/leaderboard/app.py

Lines 581 to 606 in 072e6ef

    
           def update_task_list( 
        
               benchmark_name, type_select, domain_select, lang_select, modality_select 
        
           ): 
        
               if not len(lang_select): 
        
                   return [] 
        
               start_time = time.time() 
        
               tasks_to_keep = [] 
        
               for task in mteb.get_benchmark(benchmark_name).tasks: 
        
                   if task.metadata.type not in type_select: 
        
                       continue 
        
                   if task.metadata.domains is not None and not ( 
        
                       set(task.metadata.domains) & set(domain_select) 
        
                   ): 
        
                       continue 
        
                   if task.languages is not None and not ( 
        
                       set(task.languages) & set(lang_select) 
        
                   ): 
        
                       continue 
        
                   if task.metadata.modalities and not ( 
        
                       set(task.metadata.modalities) & set(modality_select) 
        
                   ): 
        
                       continue 
        
                   tasks_to_keep.append(task.metadata.name) 
        
               elapsed = time.time() - start_time 
        
               logger.debug(f"update_task_list callback: {elapsed}s") 
        
               return sorted(tasks_to_keep)

if task.metadata.domains is not None and not (
set(task.metadata.domains) & set(domain_select)
):

This condition returns True when it is domains=[] , so the corresponding task is filtered out

q275343119 · 2025-12-02T08:38:02Z

I seem to know the reason, take STSB as an example:

mteb/mteb/tasks/sts/zho/cmtebsts.py

Lines 171 to 193 in 072e6ef

class STSB(AbsTaskSTS):

metadata = TaskMetadata(

name="STSB",

dataset={

"path": "C-MTEB/STSB",

"revision": "0cde68302b3541bb8b3c340dc0644b0b745b3dc0",

},

description="A Chinese dataset for textual relatedness",

reference="https://aclanthology.org/2021.emnlp-main.357",

type="STS",

category="t2t",

modalities=["text"],

eval_splits=["validation", "test"],

eval_langs=["cmn-Hans"],

main_score="cosine_spearman",

date=None,

domains=[],

task_subtypes=None,

license=None,

annotations_creators=None,

dialect=None,

sample_creation=None,

bibtex_citation=r"""

It's domains is an empty list domains=[]

Then task's filter function：

mteb/mteb/leaderboard/app.py

Lines 581 to 606 in 072e6ef

def update_task_list(

benchmark_name, type_select, domain_select, lang_select, modality_select

):

if not len(lang_select):

return []

start_time = time.time()

tasks_to_keep = []

for task in mteb.get_benchmark(benchmark_name).tasks:

if task.metadata.type not in type_select:

continue

if task.metadata.domains is not None and not (

set(task.metadata.domains) & set(domain_select)

):

continue

if task.languages is not None and not (

set(task.languages) & set(lang_select)

):

continue

if task.metadata.modalities and not (

set(task.metadata.modalities) & set(modality_select)

):

continue

tasks_to_keep.append(task.metadata.name)

elapsed = time.time() - start_time

logger.debug(f"update_task_list callback: {elapsed}s")

return sorted(tasks_to_keep)
if task.metadata.domains is not None and not (
set(task.metadata.domains) & set(domain_select)
):
This condition returns True when it is domains=[] , so the corresponding task is filtered out

@KennethEnevoldsen @Samoed Is the above logic a normal business operation or a bug in the filtering process?

Samoed · 2025-12-02T08:41:20Z

I think it can be changed to

if task.metadata.domains is not None and len(task.metadata.domains) > 0 and not (
    set(task.metadata.domains) & set(domain_select)
):

q275343119 · 2025-12-02T08:47:19Z

I think it can be changed to

if task.metadata.domains is not None and len(task.metadata.domains) > 0 and not (
    set(task.metadata.domains) & set(domain_select)
):

Yes, or change it to:

        if task.metadata.domains and not (
            set(task.metadata.domains) & set(domain_select)
        ):

Samoed · 2025-12-02T09:12:40Z

What was the source of bug with switching benchmarks #3629 (review)? I don't see much changes in f76e60a except for

benchmark_tasks.sort()
tasks_to_keep.sort()

I'm just curious

q275343119 · 2025-12-02T09:51:59Z

What was the source of bug with switching benchmarks #3629 (review)? I don't see much changes in f76e60a except for
benchmark_tasks.sort()
tasks_to_keep.sort()
I'm just curious

First of all, I initially assigned the 'choices' and 'value' of the component's 'CheckboxGroup' at the time of initialization, but in the 'update_task_list' method only returned 'value', and did not use 'gr.update()' to update 'choices' and 'value' at the same time, so it was an error, because the new 'value' was not in the old 'choices' So I modified 'update_task_list' to make him return 'gr.update()'

But this is still a bug because 'update_task_list' uses caching, the cache doesn't seem to be gr.update() working properly, so I defined another caching method so that it returns a list and then return 'gr.update()'

q275343119 · 2025-12-03T01:11:38Z

Leaderboard Build Tests
Hi @Samoed the test failed. What should I do to make it succeed?

Samoed · 2025-12-03T06:53:14Z

The leaderboard test is successful, but it fails on a posthook. I tried to resturn multiple times, but it still fails on a posthook for some reason

KennethEnevoldsen

This looks great @q275343119!

I did a few tests and couldn't find any way to break it.

I am not sure why the leaderboard test fails (but it is only in the post-hook, so it is not on the code side). I set it to rerun (it might have been an GitHub issue)

Can I ask you to integrate v6 (I believe you can just merge #3605)?
Just to make sure that the changes it compatible with the latest version. If there are any issues here, feel free to leave it for another PR.

Otherwise I think this is all good to merge

KennethEnevoldsen · 2025-12-03T16:25:40Z

The leaderboard issue seems to be a memory issue:

zstd: error 70 : Write error : cannot write block : No space left on device

q275343119 · 2025-12-03T16:41:49Z

The leaderboard issue seems to be a memory issue:

zstd: error 70 : Write error : cannot write block : No space left on device

Yes, I saw that too. Is it because the results are constantly being added?

KennethEnevoldsen · 2025-12-03T16:47:57Z

It might be that results is just getting too big.

Currently, we just git clone the entire repo, but it might be better to just make a shallow git clone

git clone {url} --depth 1

@Samoed do you seen any issues with this?

Samoed · 2025-12-03T17:10:59Z

No, I haven't

KennethEnevoldsen · 2025-12-03T17:16:39Z

Anyway, this is probably outside the scope of this PR. I have made an issue on it and will merge this in -- again great work @q275343119

KennethEnevoldsen and others added 7 commits November 22, 2025 19:20

fix: bump gradio to v6

bfd711a

fixes embeddings-benchmark#3601

more fixes

e4d035e

refactor themes (might need some more refactors)

b4e76cf

feat - issue embeddings-benchmark#3569

5435f9b

feat - issue embeddings-benchmark#3569

7bfe98e

feat - issue embeddings-benchmark#3569

2f33c94

feat - issue embeddings-benchmark#3616

0b7c9c3

Samoed added the leaderboard issues related to the leaderboard label Nov 27, 2025

Samoed closed this Nov 27, 2025

Samoed reopened this Nov 27, 2025

q275343119 added 2 commits November 27, 2025 21:23

feat - CheckboxGroup

498c2ae

feat - ruff check

9c483df

Samoed requested a review from KennethEnevoldsen November 27, 2025 14:25

KennethEnevoldsen reviewed Nov 28, 2025

View reviewed changes

KennethEnevoldsen changed the title ~~Fix Issue 3569&3616~~ fix: fix display for task information and improve UI for benchmark filtering Nov 28, 2025

bump gradio

43cbdf0

fix - task options bug

f76e60a

fix - fix task filter condition

d9e7658

This comment was marked as off-topic.

Sign in to view

KennethEnevoldsen approved these changes Dec 3, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/gradiov6' into feat-issue-3569

d09f35f

KennethEnevoldsen mentioned this pull request Dec 3, 2025

ci: The CI "Leaderboard Build Tests" fails likely due to results size #3650

Closed

KennethEnevoldsen merged commit a882295 into embeddings-benchmark:main Dec 3, 2025
8 of 9 checks passed

fix: fix display for task information and improve UI for benchmark filtering #3629

fix: fix display for task information and improve UI for benchmark filtering #3629

Uh oh!

Conversation

q275343119 commented Nov 27, 2025 • edited by Samoed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

q275343119 commented Nov 27, 2025

Uh oh!

Samoed commented Nov 27, 2025

Uh oh!

q275343119 commented Nov 27, 2025

Uh oh!

q275343119 commented Nov 27, 2025

Uh oh!

Samoed commented Nov 27, 2025

Uh oh!

Samoed commented Nov 27, 2025

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

q275343119 commented Nov 29, 2025

Uh oh!

q275343119 commented Dec 2, 2025

Uh oh!

q275343119 commented Dec 2, 2025

Uh oh!

q275343119 commented Dec 2, 2025

Uh oh!

Samoed commented Dec 2, 2025

Uh oh!

q275343119 commented Dec 2, 2025

Uh oh!

Samoed commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

q275343119 commented Dec 2, 2025

Uh oh!

q275343119 commented Dec 3, 2025

Uh oh!

This comment was marked as off-topic.

Samoed commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented Dec 3, 2025

Uh oh!

q275343119 commented Dec 3, 2025

Uh oh!

KennethEnevoldsen commented Dec 3, 2025

Uh oh!

Samoed commented Dec 3, 2025

Uh oh!

KennethEnevoldsen commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

q275343119 commented Nov 27, 2025 •

edited by Samoed

Loading

Samoed commented Dec 2, 2025 •

edited

Loading

Samoed commented Dec 3, 2025 •

edited

Loading