Question / Confirmation on arange in encode_text #371

AwePhD · 2023-06-29T15:05:07Z

Hello,

This is a minor question about the code, I want to be sure I do not let slip any subtleties.

In L354 of model.py, there is the final step to extract the text

# x.shape = [batch_size, n_ctx, transformer.width]
x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection

self.text_projection is the last projection to have the text features
text.argmax(dim=-1) picks the features of the EOT token.

Why there is a torch.arange(x.shape[0])? It could be x[:, text.argmax(dim=-1)], right?

Thanks for the work, code and model.

The text was updated successfully, but these errors were encountered:

jongwook · 2023-06-29T21:40:58Z

We need to extract the features (of shape transformer.width) from specific different locations of the EOT token across the batch, so the shape becomes [batch_size, transformer.width].

If we did x[:, text.argmax(dim=-1)], it simply selects the features at any of the argmax indices across the batch, so the shape would become [batch_size, batch_size, transformer.width].

It could be simpler to use index_select instead and still achieve the same effect, but I found myself fewer mistakes by explicitly indexing with torch.arange().

AwePhD · 2023-07-11T21:30:43Z

Hello,

You are absolutely right, thanks for your time to clear my confusion!

AwePhD closed this as completed Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question / Confirmation on arange in encode_text #371

Question / Confirmation on arange in encode_text #371

AwePhD commented Jun 29, 2023

jongwook commented Jun 29, 2023

AwePhD commented Jul 11, 2023

Question / Confirmation on arange in encode_text #371

Question / Confirmation on arange in encode_text #371

Comments

AwePhD commented Jun 29, 2023

jongwook commented Jun 29, 2023

AwePhD commented Jul 11, 2023