Skip to content

Commit c988b78

Browse files
Yi Wangfacebook-github-bot
authored andcommitted
Add a description of GradBucket Python class (pytorch#53596)
Summary: Pull Request resolved: pytorch#53596 This description will be used in ddp_comm_hook docstrings. ghstack-source-id: 123590360 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26908160 fbshipit-source-id: 824dea9203ca583676bddf0161c9edca52c9d20e
1 parent 741d0f4 commit c988b78

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

torch/csrc/distributed/c10d/init.cpp

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,21 @@ PyObject* c10d_init(PyObject* _unused, PyObject* noargs) {
184184
py::arg("reducer"),
185185
py::arg("comm_hook_type"));
186186

187-
shared_ptr_class_<::c10d::GradBucket>(module, "GradBucket")
187+
shared_ptr_class_<::c10d::GradBucket>(
188+
module,
189+
"GradBucket",
190+
R"(
191+
This class mainly passes a list of gradient tensors
192+
(returned by :meth:`~torch.distributed.GradBucket.get_tensors`)
193+
to DDP communication hook,
194+
where each tensor in the list refers to the replica on each device.
195+
Since DDP communication hook only supports single process single device mode at this time,
196+
only exactly one tensor is stored in this bucket.
197+
This tensor is actually a flattened 1D tensor,
198+
which can be further decomposed into a list of per-parameter tensors within this bucket
199+
(returned by :meth:`~torch.distributed.GradBucket.get_per_parameter_tensors`)
200+
to apply layer-wise operations.
201+
)")
188202
.def(
189203
py::init<
190204
size_t,

0 commit comments

Comments
 (0)