Skip to content

Misclassification of Out-of-Domain Videos in TC-CLIP #7

@ooza

Description

@ooza

Hi
Thanks again for this interesting model. I tested the demo notebook file on a small custom dataset.
(tc_clip_model_path = "pretrained/zero_shot_k400_llm_tc_clip.pth" # pretrained model path)
And I'm encountering an issue where TC-CLIP misclassifies videos that do not belong to any of the defined action classes. For example, I added a neutral video of a puppy (completely unrelated to the action classes) to my dataset, which consists of the following classes:

  • stealing
  • robbery
  • violence

Despite the video's irrelevance, the model assigns it a label (stealing) based on the highest logit value.
{'stealing': 24.42, 'robbery': 22.50, 'violence': 23.47}
This behavior is problematic because it suggests that the model always outputs one of the predefined classes, even when the input does not fit any of them.

What I Tried

  • Rejection Threshold: I implemented a threshold to reject predictions where the highest logit is below a certain value. However, this approach did not generalize well and led to poor performance when legitimate action videos had logits close to the threshold.

  • Neutral Class: I added an "other" class. Yet, this approach was not efficient:

Screenshot 2024-12-04 at 12 05 52

Expected Behavior
The model should ideally: Provide an "unknown" or "no action" output for videos that do not belong to any defined class.
Avoid forcing a prediction into one of the predefined classes when the input is irrelevant.

Could you please provide guidance or suggest strategies to handle out-of-distribution inputs effectively in TC-CLIP?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions