Update layers.py to ensure grad_output is contiguous (NVIDIA#1601)

markelsanz14 · web-flow · commit 95cea8e3e5b5 · 2023-02-28T10:29:15.000-08:00
Depending on how the forward pass is implemented, grad_output in this function will not be contiguous, so it crashes when calling .view(). By adding this extra line, we ensure it's contiguous. And it's a no-op when it already is contiguous, so performance shouldn't sufffer.
diff --git a/apex/transformer/tensor_parallel/layers.py b/apex/transformer/tensor_parallel/layers.py
@@ -396,6 +396,7 @@ def backward(ctx, grad_output):
             return grad_input, None, None, None, None, None, None
 
         # Convert the tensor shapes to 2D for execution compatibility
+        grad_output = grad_output.contiguous()
         grad_output = grad_output.view(
             grad_output.shape[0] * grad_output.shape[1], grad_output.shape[2]
         )

Original file line number	Diff line number	Diff line change
`@@ -396,6 +396,7 @@ def backward(ctx, grad_output):`
`396`	`396`	`return grad_input, None, None, None, None, None, None`
`397`	`397`
`398`	`398`	`# Convert the tensor shapes to 2D for execution compatibility`
	`399`	`+ grad_output = grad_output.contiguous()`
`399`	`400`	`grad_output = grad_output.view(`
`400`	`401`	`grad_output.shape[0] * grad_output.shape[1], grad_output.shape[2]`
`401`	`402`	`)`