Description
The packages I am using are the following:
Python 3.10
PyTorch 2.6.0
captum 0.7.0
I am trying to explain the behavior of a BERT model trained for sequence learning. The model is the following, which is loaded using HuggingFace and the transformers library. The input to the model is a list of tokens that varies depending on the input length, and the target is a label per each input token.
I would like to use an Integrated gradient to explain the model's behavior for each of the tokens and how the context influences the model's decision.
My model looks as follows:
To run the IG, I am using the following code:
`
...
baseline_ids = baseline_ids.unsqueeze(0) #torch.Size([1, 136])
input_ids = input_ids.unsqueeze(0) #torch.Size([1, 1, 136])
input_embeds = self.model.bert.embeddings.word_embeddings(input_ids) #torch.Size([1, 136, 768])
baseline_embeds = self.model.bert.embeddings.word_embeddings(baseline_ids) #torch.Size([1, 136, 768])
ig = IntegratedGradients(model_forward)
if len(input_embeds.shape) == 4:
input_embeds = input_embeds.squeeze(1)
baseline_embeds = baseline_embeds.squeeze(1)
attributions = ig.attribute(inputs=input_embeds, baselines=baseline_embeds,target=0, n_steps=n_steps)
`
Where target = 0 is because of the label for the first token. The following code throws the following error:
AssertionError: Target not provided when necessary, cannot take gradient concerning multiple outputs.
The model outputs a 19th-dimensional vector per token. Why am I receiving this error?