Skip to content

Commit cd0719d

Browse files
committed
Correct KV comment seqlen -> seqlen + cache_len
Update and add comments about the shape of the key and value matrices in the attention component. E.g., the second dimension is of length seqlen + cache_len not seqlen as previously stated.
1 parent 6b3154b commit cd0719d

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

llama/model.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -289,12 +289,12 @@ def forward(
289289
values = self.cache_v[:bsz, : start_pos + seqlen]
290290

291291
# repeat k/v heads if n_kv_heads < n_heads
292-
keys = repeat_kv(keys, self.n_rep) # (bs, seqlen, n_local_heads, head_dim)
293-
values = repeat_kv(values, self.n_rep) # (bs, seqlen, n_local_heads, head_dim)
292+
keys = repeat_kv(keys, self.n_rep) # (bs, cache_len + seqlen, n_local_heads, head_dim)
293+
values = repeat_kv(values, self.n_rep) # (bs, cache_len + seqlen, n_local_heads, head_dim)
294294

295295
xq = xq.transpose(1, 2) # (bs, n_local_heads, seqlen, head_dim)
296-
keys = keys.transpose(1, 2)
297-
values = values.transpose(1, 2)
296+
keys = keys.transpose(1, 2) # (bs, n_local_heads, cache_len + seqlen, head_dim)
297+
values = values.transpose(1, 2) # (bs, n_local_heads, cache_len + seqlen, head_dim)
298298
scores = torch.matmul(xq, keys.transpose(2, 3)) / math.sqrt(self.head_dim)
299299
if mask is not None:
300300
scores = scores + mask # (bs, n_local_heads, seqlen, cache_len + seqlen)

0 commit comments

Comments
 (0)