Skip to content

Commit 95cea8e

Browse files
authored
Update layers.py to ensure grad_output is contiguous (NVIDIA#1601)
Depending on how the forward pass is implemented, grad_output in this function will not be contiguous, so it crashes when calling .view(). By adding this extra line, we ensure it's contiguous. And it's a no-op when it already is contiguous, so performance shouldn't sufffer.
1 parent eec7250 commit 95cea8e

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

apex/transformer/tensor_parallel/layers.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,7 @@ def backward(ctx, grad_output):
396396
return grad_input, None, None, None, None, None, None
397397

398398
# Convert the tensor shapes to 2D for execution compatibility
399+
grad_output = grad_output.contiguous()
399400
grad_output = grad_output.view(
400401
grad_output.shape[0] * grad_output.shape[1], grad_output.shape[2]
401402
)

0 commit comments

Comments
 (0)