Add a description of GradBucket Python class (pytorch#53596)

Yi Wang · facebook-github-bot · commit c988b78be2ec · 2021-03-10T16:12:53.000-08:00
Summary: Pull Request resolved: pytorch#53596 This description will be used in ddp_comm_hook docstrings. ghstack-source-id: 123590360 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26908160 fbshipit-source-id: 824dea9203ca583676bddf0161c9edca52c9d20e
diff --git a/torch/csrc/distributed/c10d/init.cpp b/torch/csrc/distributed/c10d/init.cpp
@@ -184,7 +184,21 @@ PyObject* c10d_init(PyObject* _unused, PyObject* noargs) {
           py::arg("reducer"),
           py::arg("comm_hook_type"));
 
-  shared_ptr_class_<::c10d::GradBucket>(module, "GradBucket")
+  shared_ptr_class_<::c10d::GradBucket>(
+      module,
+      "GradBucket",
+      R"(
+This class mainly passes a list of gradient tensors
+(returned by :meth:`~torch.distributed.GradBucket.get_tensors`)
+to DDP communication hook,
+where each tensor in the list refers to the replica on each device.
+Since DDP communication hook only supports single process single device mode at this time,
+only exactly one tensor is stored in this bucket.
+This tensor is actually a flattened 1D tensor,
+which can be further decomposed into a list of per-parameter tensors within this bucket
+(returned by :meth:`~torch.distributed.GradBucket.get_per_parameter_tensors`)
+to apply layer-wise operations.
+)")
       .def(
           py::init<
               size_t,