-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi
Thanks again for this interesting model. I tested the demo notebook file on a small custom dataset.
(tc_clip_model_path = "pretrained/zero_shot_k400_llm_tc_clip.pth" # pretrained model path)
And I'm encountering an issue where TC-CLIP misclassifies videos that do not belong to any of the defined action classes. For example, I added a neutral video of a puppy (completely unrelated to the action classes) to my dataset, which consists of the following classes:
- stealing
- robbery
- violence
Despite the video's irrelevance, the model assigns it a label (stealing) based on the highest logit value.
{'stealing': 24.42, 'robbery': 22.50, 'violence': 23.47}
This behavior is problematic because it suggests that the model always outputs one of the predefined classes, even when the input does not fit any of them.
What I Tried
-
Rejection Threshold: I implemented a threshold to reject predictions where the highest logit is below a certain value. However, this approach did not generalize well and led to poor performance when legitimate action videos had logits close to the threshold.
-
Neutral Class: I added an "other" class. Yet, this approach was not efficient:
Expected Behavior
The model should ideally: Provide an "unknown" or "no action" output for videos that do not belong to any defined class.
Avoid forcing a prediction into one of the predefined classes when the input is irrelevant.
Could you please provide guidance or suggest strategies to handle out-of-distribution inputs effectively in TC-CLIP?
