You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to explain the behavior of a BERT model trained for sequence learning. The model is the following, which is loaded using HuggingFace and the transformers library. The input to the model is a list of tokens that varies depending on the input length, and the target is a label per each input token.
I would like to use an Integrated gradient to explain the model's behavior for each of the tokens and how the context influences the model's decision.
So the input_embeds shape is torch.Size([50, 136, 768]) (where 50 is the number of steps, 136 the number of tokens, and 768 the last layer dimensionality) and the model(inputs_embeds=inputs_embeds).logits returns torch.Size([50, 136, 19]) while in my understanding it should have been torch.Size([50, 19]). Why is that happening? Moreover, what is the need of this function in my code?
Hi, this is because the function is expecting a simpler classifier than outputs a tensor of shape [num_samples, num_classes]. Your model is outputting a tensor of shape [num_samples, num_tokens, num_classes]. I think you might be able to fix it by setting the target to (0, 0) for class 0 and token 0. However, you would need to run this 19 times, once per class (that you are interested in).
The packages I am using are the following:
Python 3.10
PyTorch 2.6.0
captum 0.7.0
I am trying to explain the behavior of a BERT model trained for sequence learning. The model is the following, which is loaded using HuggingFace and the transformers library. The input to the model is a list of tokens that varies depending on the input length, and the target is a label per each input token.
I would like to use an Integrated gradient to explain the model's behavior for each of the tokens and how the context influences the model's decision.
My model looks as follows:
To run the IG, I am using the following code:
`
...
`
Where target = 0 is because of the label for the first token. The following code throws the following error:
AssertionError: Target not provided when necessary, cannot take gradient concerning multiple outputs.
The model outputs a 19th-dimensional vector per token. Why am I receiving this error?
The text was updated successfully, but these errors were encountered: