Error in torch_compile.ipynb #1988

KumoLiu · 2025-05-07T11:31:46Z

[2025-05-06T16:47:27.159Z] Running ./modules/torch_compile.ipynb
[2025-05-06T16:47:27.159Z] Checking PEP8 compliance...
[2025-05-06T16:47:27.719Z] Running notebook...
[2025-05-06T16:47:34.247Z] Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models
[2025-05-06T16:47:37.507Z] MONAI version: 1.4.1rc1+46.gb58e883c
[2025-05-06T16:47:37.507Z] Numpy version: 1.26.4
[2025-05-06T16:47:37.507Z] Pytorch version: 2.7.0+cu126
[2025-05-06T16:47:37.507Z] MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
[2025-05-06T16:47:37.507Z] MONAI rev id: b58e883c887e0f99d382807550654c44d94f47bd
[2025-05-06T16:47:37.507Z] MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
[2025-05-06T16:47:37.507Z] 
[2025-05-06T16:47:37.507Z] Optional dependencies:
[2025-05-06T16:47:38.067Z] Pytorch Ignite version: 0.4.11
[2025-05-06T16:47:38.067Z] ITK version: 5.4.3
[2025-05-06T16:47:38.067Z] Nibabel version: 5.3.2
[2025-05-06T16:47:38.067Z] scikit-image version: 0.19.3
[2025-05-06T16:47:38.067Z] scipy version: 1.14.0
[2025-05-06T16:47:38.067Z] Pillow version: 7.0.0
[2025-05-06T16:47:38.067Z] Tensorboard version: 2.16.2
[2025-05-06T16:47:38.067Z] gdown version: 5.2.0
[2025-05-06T16:47:38.067Z] TorchVision version: 0.22.0+cu126
[2025-05-06T16:47:38.067Z] tqdm version: 4.66.5
[2025-05-06T16:47:38.067Z] lmdb version: 1.6.2
[2025-05-06T16:47:38.067Z] psutil version: 6.0.0
[2025-05-06T16:47:38.067Z] pandas version: 2.2.2
[2025-05-06T16:47:38.067Z] einops version: 0.8.0
[2025-05-06T16:47:38.067Z] transformers version: 4.40.2
[2025-05-06T16:47:38.067Z] mlflow version: 2.22.0
[2025-05-06T16:47:38.067Z] pynrrd version: 1.1.3
[2025-05-06T16:47:38.067Z] clearml version: 2.0.0rc0
[2025-05-06T16:47:38.067Z] 
[2025-05-06T16:47:38.067Z] For details about installing the optional dependencies, please visit:
[2025-05-06T16:47:38.067Z]     https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
[2025-05-06T16:47:38.067Z] 
[2025-05-06T16:47:41.332Z] papermill  --progress-bar --log-output -k python3
[2025-05-06T16:47:41.332Z] /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:149: UserWarning: the file is not specified with any extension : -
[2025-05-06T16:47:41.332Z]   warnings.warn(f"the file is not specified with any extension : {os.path.basename(path)}")
[2025-05-06T16:55:02.641Z] 
Executing:   0%|          | 0/32 [00:00<?, ?cell/s]
Executing:   3%|▎         | 1/32 [00:00<00:29,  1.04cell/s]
Executing:  12%|█▎        | 4/32 [00:12<01:33,  3.33s/cell]
Executing:  19%|█▉        | 6/32 [00:22<01:44,  4.03s/cell]
Executing:  31%|███▏      | 10/32 [00:26<00:52,  2.38s/cell]
Executing:  50%|█████     | 16/32 [00:28<00:19,  1.23s/cell]
Executing:  56%|█████▋    | 18/32 [00:35<00:23,  1.71s/cell]
Executing:  62%|██████▎   | 20/32 [07:07<09:10, 45.85s/cell]
Executing:  69%|██████▉   | 22/32 [07:08<05:46, 34.69s/cell]
Executing:  72%|███████▏  | 23/32 [07:18<04:38, 30.98s/cell]
Executing:  72%|███████▏  | 23/32 [07:20<02:52, 19.17s/cell]
[2025-05-06T16:55:02.641Z] /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:149: UserWarning: the file is not specified with any extension : -
[2025-05-06T16:55:02.641Z]   warnings.warn(f"the file is not specified with any extension : {os.path.basename(path)}")
[2025-05-06T16:55:02.641Z] Traceback (most recent call last):
[2025-05-06T16:55:02.641Z]   File "/usr/local/bin/papermill", line 8, in <module>
[2025-05-06T16:55:02.641Z]     sys.exit(papermill())
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
[2025-05-06T16:55:02.641Z]     return self.main(*args, **kwargs)
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
[2025-05-06T16:55:02.641Z]     rv = self.invoke(ctx)
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
[2025-05-06T16:55:02.641Z]     return ctx.invoke(self.callback, **ctx.params)
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
[2025-05-06T16:55:02.641Z]     return __callback(*args, **kwargs)
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
[2025-05-06T16:55:02.641Z]     return f(get_current_context(), *args, **kwargs)
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/papermill/cli.py", line 235, in papermill
[2025-05-06T16:55:02.641Z]     execute_notebook(
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/papermill/execute.py", line 131, in execute_notebook
[2025-05-06T16:55:02.641Z]     raise_for_execution_errors(nb, output_path)
[2025-05-06T16:55:02.641Z]   File "/usr/local/lib/python3.10/dist-packages/papermill/execute.py", line 251, in raise_for_execution_errors
[2025-05-06T16:55:02.641Z]     raise error
[2025-05-06T16:55:02.641Z] papermill.exceptions.PapermillExecutionError: 
[2025-05-06T16:55:02.641Z] ---------------------------------------------------------------------------
[2025-05-06T16:55:02.641Z] Exception encountered at "In [11]":
[2025-05-06T16:55:02.641Z] ---------------------------------------------------------------------------
[2025-05-06T16:55:02.641Z] InductorError                             Traceback (most recent call last)
[2025-05-06T16:55:02.641Z] Cell In[11], line 14
[2025-05-06T16:55:02.641Z]      12 inputs, labels = batch_data["image"].to(device), batch_data["label"].to(device)
[2025-05-06T16:55:02.641Z]      13 optimizer.zero_grad()
[2025-05-06T16:55:02.641Z] ---> 14 loss, train_time = timed(lambda: train(model_opt, inputs, labels))  # noqa: B023
[2025-05-06T16:55:02.641Z]      15 optimizer.step()
[2025-05-06T16:55:02.641Z]      16 epoch_loss += loss.item()
[2025-05-06T16:55:02.641Z] 
[2025-05-06T16:55:02.641Z] Cell In[6], line 5, in timed(fn)
[2025-05-06T16:55:02.641Z]       3 end = torch.cuda.Event(enable_timing=True)
[2025-05-06T16:55:02.641Z]       4 start.record()
[2025-05-06T16:55:02.641Z] ----> 5 result = fn()
[2025-05-06T16:55:02.641Z]       6 end.record()
[2025-05-06T16:55:02.641Z]       7 torch.cuda.synchronize()
[2025-05-06T16:55:02.641Z] 
[2025-05-06T16:55:02.641Z] Cell In[11], line 14, in <lambda>()
[2025-05-06T16:55:02.641Z]      12 inputs, labels = batch_data["image"].to(device), batch_data["label"].to(device)
[2025-05-06T16:55:02.641Z]      13 optimizer.zero_grad()
[2025-05-06T16:55:02.641Z] ---> 14 loss, train_time = timed(lambda: train(model_opt, inputs, labels))  # noqa: B023
[2025-05-06T16:55:02.641Z]      15 optimizer.step()
[2025-05-06T16:55:02.641Z]      16 epoch_loss += loss.item()
[2025-05-06T16:55:02.641Z] 
[2025-05-06T16:55:02.641Z] Cell In[6], line 12, in train(model, inputs, labels)
[2025-05-06T16:55:02.641Z]      11 def train(model, inputs, labels):
[2025-05-06T16:55:02.641Z] ---> 12     outputs = model(inputs)
[2025-05-06T16:55:02.642Z]      13     loss_function = monai.losses.DiceCELoss(to_onehot_y=True, softmax=True)
[2025-05-06T16:55:02.642Z]      14     loss = loss_function(outputs, labels)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
[2025-05-06T16:55:02.642Z]    1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
[2025-05-06T16:55:02.642Z]    1750 else:
[2025-05-06T16:55:02.642Z] -> 1751     return self._call_impl(*args, **kwargs)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
[2025-05-06T16:55:02.642Z]    1757 # If we don't have any hooks, we want to skip the rest of the logic in
[2025-05-06T16:55:02.642Z]    1758 # this function, and just call forward.
[2025-05-06T16:55:02.642Z]    1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[2025-05-06T16:55:02.642Z]    1760         or _global_backward_pre_hooks or _global_backward_hooks
[2025-05-06T16:55:02.642Z]    1761         or _global_forward_hooks or _global_forward_pre_hooks):
[2025-05-06T16:55:02.642Z] -> 1762     return forward_call(*args, **kwargs)
[2025-05-06T16:55:02.642Z]    1764 result = None
[2025-05-06T16:55:02.642Z]    1765 called_always_called_hooks = set()
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:663, in _TorchDynamoContext.__call__.<locals>._fn(*args, **kwargs)
[2025-05-06T16:55:02.642Z]     659     raise e.with_traceback(None) from None
[2025-05-06T16:55:02.642Z]     660 except ShortenTraceback as e:
[2025-05-06T16:55:02.642Z]     661     # Failures in the backend likely don't have useful
[2025-05-06T16:55:02.642Z]     662     # data in the TorchDynamo frames, so we strip them out.
[2025-05-06T16:55:02.642Z] --> 663     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[2025-05-06T16:55:02.642Z]     664 finally:
[2025-05-06T16:55:02.642Z]     665     # Restore the dynamic layer stack depth if necessary.
[2025-05-06T16:55:02.642Z]     666     set_eval_frame(None)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:760, in _compile_fx_inner(gm, example_inputs, **graph_kwargs)
[2025-05-06T16:55:02.642Z]     758     raise
[2025-05-06T16:55:02.642Z]     759 except Exception as e:
[2025-05-06T16:55:02.642Z] --> 760     raise InductorError(e, currentframe()).with_traceback(
[2025-05-06T16:55:02.642Z]     761         e.__traceback__
[2025-05-06T16:55:02.642Z]     762     ) from None
[2025-05-06T16:55:02.642Z]     763 finally:
[2025-05-06T16:55:02.642Z]     764     TritonBundler.end_compile()
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:745, in _compile_fx_inner(gm, example_inputs, **graph_kwargs)
[2025-05-06T16:55:02.642Z]     743 TritonBundler.begin_compile()
[2025-05-06T16:55:02.642Z]     744 try:
[2025-05-06T16:55:02.642Z] --> 745     mb_compiled_graph = fx_codegen_and_compile(
[2025-05-06T16:55:02.642Z]     746         gm, example_inputs, inputs_to_check, **graph_kwargs
[2025-05-06T16:55:02.642Z]     747     )
[2025-05-06T16:55:02.642Z]     748     assert mb_compiled_graph is not None
[2025-05-06T16:55:02.642Z]     749     mb_compiled_graph._time_taken_ns = time.time_ns() - start_time
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:1295, in fx_codegen_and_compile(gm, example_inputs, inputs_to_check, **graph_kwargs)
[2025-05-06T16:55:02.642Z]    1291     from .compile_fx_subproc import _SubprocessFxCompile
[2025-05-06T16:55:02.642Z]    1293     scheme = _SubprocessFxCompile()
[2025-05-06T16:55:02.642Z] -> 1295 return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:1119, in _InProcessFxCompile.codegen_and_compile(self, gm, example_inputs, inputs_to_check, graph_kwargs)
[2025-05-06T16:55:02.642Z]    1117 metrics_helper = metrics.CachedMetricsHelper()
[2025-05-06T16:55:02.642Z]    1118 with V.set_graph_handler(graph):
[2025-05-06T16:55:02.642Z] -> 1119     graph.run(*example_inputs)
[2025-05-06T16:55:02.642Z]    1120     output_strides: list[Optional[tuple[_StrideExprStr, ...]]] = []
[2025-05-06T16:55:02.642Z]    1121     if graph.graph_outputs is not None:
[2025-05-06T16:55:02.642Z]    1122         # We'll put the output strides in the compiled graph so we
[2025-05-06T16:55:02.642Z]    1123         # can later return them to the caller via TracingContext
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py:877, in GraphLowering.run(self, *args)
[2025-05-06T16:55:02.642Z]     875 def run(self, *args: Any) -> Any:  # type: ignore[override]
[2025-05-06T16:55:02.642Z]     876     with dynamo_timed("GraphLowering.run"):
[2025-05-06T16:55:02.642Z] --> 877         return super().run(*args)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/fx/interpreter.py:171, in Interpreter.run(self, initial_env, enable_io_processing, *args)
[2025-05-06T16:55:02.642Z]     168     continue
[2025-05-06T16:55:02.642Z]     170 try:
[2025-05-06T16:55:02.642Z] --> 171     self.env[node] = self.run_node(node)
[2025-05-06T16:55:02.642Z]     172 except Exception as e:
[2025-05-06T16:55:02.642Z]     173     if self.extra_traceback:
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py:1527, in GraphLowering.run_node(self, n)
[2025-05-06T16:55:02.642Z]    1525 else:
[2025-05-06T16:55:02.642Z]    1526     debug("")
[2025-05-06T16:55:02.642Z] -> 1527     result = super().run_node(n)
[2025-05-06T16:55:02.642Z]    1529 # require the same stride order for dense outputs,
[2025-05-06T16:55:02.642Z]    1530 # 1. user-land view() will not throw because inductor
[2025-05-06T16:55:02.642Z]    1531 # output different strides than eager
[2025-05-06T16:55:02.642Z]    (...)
[2025-05-06T16:55:02.642Z]    1534 # 2: as_strided ops, we need make sure its input has same size/stride with
[2025-05-06T16:55:02.642Z]    1535 # eager model to align with eager behavior.
[2025-05-06T16:55:02.642Z]    1536 as_strided_ops = [
[2025-05-06T16:55:02.642Z]    1537     torch.ops.aten.as_strided.default,
[2025-05-06T16:55:02.642Z]    1538     torch.ops.aten.as_strided_.default,
[2025-05-06T16:55:02.642Z]    (...)
[2025-05-06T16:55:02.642Z]    1541     torch.ops.aten.resize_as.default,
[2025-05-06T16:55:02.642Z]    1542 ]
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/fx/interpreter.py:240, in Interpreter.run_node(self, n)
[2025-05-06T16:55:02.642Z]     238 assert isinstance(args, tuple)
[2025-05-06T16:55:02.642Z]     239 assert isinstance(kwargs, dict)
[2025-05-06T16:55:02.642Z] --> 240 return getattr(self, n.op)(n.target, args, kwargs)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/graph.py:1169, in GraphLowering.call_function(self, target, args, kwargs)
[2025-05-06T16:55:02.642Z]    1163             decided_constraint = None  # type: ignore[assignment]
[2025-05-06T16:55:02.642Z]    1165     # for implicitly fallback ops, we conservatively requires
[2025-05-06T16:55:02.642Z]    1166     # contiguous input since some eager kernels does not
[2025-05-06T16:55:02.642Z]    1167     # support non-contiguous inputs. They may silently cause
[2025-05-06T16:55:02.642Z]    1168     # accuracy problems. Check https://github.com/pytorch/pytorch/issues/140452
[2025-05-06T16:55:02.642Z] -> 1169     make_fallback(target, layout_constraint=decided_constraint)
[2025-05-06T16:55:02.642Z]    1171 elif get_decompositions([target]):
[2025-05-06T16:55:02.642Z]    1172     # There isn't a good way to dynamically patch this in
[2025-05-06T16:55:02.642Z]    1173     # since AOT Autograd already ran.  The error message tells
[2025-05-06T16:55:02.642Z]    1174     # the user how to fix it.
[2025-05-06T16:55:02.642Z]    1175     raise MissingOperatorWithDecomp(target, args, kwargs)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] File /usr/local/lib/python3.10/dist-packages/torch/_inductor/lowering.py:2023, in make_fallback(op, layout_constraint, warn, override_decomp)
[2025-05-06T16:55:02.642Z]    2018         torch._dynamo.config.suppress_errors = False
[2025-05-06T16:55:02.642Z]    2019         log.warning(
[2025-05-06T16:55:02.642Z]    2020             "A make_fallback error occurred in suppress_errors config,"
[2025-05-06T16:55:02.642Z]    2021             " and suppress_errors is being disabled to surface it."
[2025-05-06T16:55:02.642Z]    2022         )
[2025-05-06T16:55:02.642Z] -> 2023     raise AssertionError(
[2025-05-06T16:55:02.642Z]    2024         f"make_fallback({op}): a decomposition exists, we should switch to it."
[2025-05-06T16:55:02.642Z]    2025         " To fix this error, either add a decomposition to core_aten_decompositions (preferred)"
[2025-05-06T16:55:02.642Z]    2026         " or inductor_decompositions, and delete the corresponding `make_fallback` line."
[2025-05-06T16:55:02.642Z]    2027         " Get help from the inductor team if unsure, don't pick arbitrarily to unblock yourself.",
[2025-05-06T16:55:02.642Z]    2028     )
[2025-05-06T16:55:02.642Z]    2030 def register_fallback(op_overload):
[2025-05-06T16:55:02.642Z]    2031     add_needs_realized_inputs(op_overload)
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] InductorError: AssertionError: make_fallback(aten.upsample_trilinear3d.default): a decomposition exists, we should switch to it. To fix this error, either add a decomposition to core_aten_decompositions (preferred) or inductor_decompositions, and delete the corresponding `make_fallback` line. Get help from the inductor team if unsure, don't pick arbitrarily to unblock yourself.
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] 
[2025-05-06T16:55:02.642Z] real	7m21.829s
[2025-05-06T16:55:02.642Z] user	8m18.975s
[2025-05-06T16:55:02.642Z] sys	5m22.797s
[2025-05-06T16:55:02.642Z] Check failed!

The text was updated successfully, but these errors were encountered:

KumoLiu added the bug Something isn't working label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in torch_compile.ipynb #1988

Error in torch_compile.ipynb #1988

KumoLiu commented May 7, 2025

Error in torch_compile.ipynb #1988

Error in torch_compile.ipynb #1988

Comments

KumoLiu commented May 7, 2025