You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the logits is the cosine similarity matrix for n image-text pairs. But cross entropy loss will make the rows/cols close to [0, 1, 2, 3, ...] (np.arange(n)), leading to logits[0][0]=0, equally the similarity of image[0] and text[0] is getting to 0.
But our goal is to enlarge the cosine similarity of image[0] and text[0].
So, could some awesome guys help me understand why using np.arange(n) rather than using one-hot as labels?
The text was updated successfully, but these errors were encountered:
Thanks for sharing the nice work. But I didn't fully understand the labels usage:
the
logits
is the cosine similarity matrix forn
image-text pairs. But cross entropy loss will make the rows/cols close to [0, 1, 2, 3, ...] (np.arange(n)
), leading tologits[0][0]=0
, equally the similarity of image[0] and text[0] is getting to 0.But our goal is to enlarge the cosine similarity of image[0] and text[0].
So, could some awesome guys help me understand why using np.arange(n) rather than using one-hot as labels?
The text was updated successfully, but these errors were encountered: