huggingface · sayakpaul · May 21, 2024 · May 17, 2024 · May 17, 2024 · May 20, 2024
diff --git a/docs/source/en/api/video_processor.md b/docs/source/en/api/video_processor.md
@@ -12,4 +12,10 @@ specific language governing permissions and limitations under the License.
 
 # Video Processor
 
-The `VideoProcessor` provides a unified API for video pipelines to prepare inputs for VAE encoding and post-processing outputs once they're decoded. The class inherits [`VaeImageProcessor`] so it includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
+The [`VideoProcessor`] provides a unified API for video pipelines to prepare inputs for VAE encoding and post-processing outputs once they're decoded. The class inherits [`VaeImageProcessor`] so it includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
+
+## VideoProcessor
+
+[[autodoc]] video_processor.VideoProcessor.preprocess_video
+
+[[autodoc]] video_processor.VideoProcessor.postprocess_video
diff --git a/src/diffusers/video_processor.py b/src/diffusers/video_processor.py
@@ -30,17 +30,19 @@ def preprocess_video(self, video, height: Optional[int] = None, width: Optional[
         Preprocesses input video(s).
 
         Args:
-            video: The input video. It can be one of the following:
+            video (`List[PIL.Image]`, `List[List[PIL.Image]]`, `torch.Tensor`, `np.array`, `List[torch.Tensor]`, `List[np.array]`):
+                The input video. It can be one of the following:
                 * List of the PIL images.
                 * List of list of PIL images.
-                * 4D Torch tensors (expected shape for each tensor: (num_frames, num_channels, height, width)).
-                * 4D NumPy arrays (expected shape for each array: (num_frames, height, width, num_channels)).
-                * List of 4D Torch tensors (expected shape for each tensor: (num_frames, num_channels, height, width)).
-                * List of 4D NumPy arrays (expected shape for each array: (num_frames, height, width, num_channels)).
-                * 5D NumPy arrays: expected shape for each array: (batch_size, num_frames, height, width,
-                  num_channels).
-                * 5D Torch tensors: expected shape for each array: (batch_size, num_frames, num_channels, height,
-                  width).
+                * 4D Torch tensors (expected shape for each tensor `(num_frames, num_channels, height, width)`).
+                * 4D NumPy arrays (expected shape for each array `(num_frames, height, width, num_channels)`).
+                * List of 4D Torch tensors (expected shape for each tensor `(num_frames, num_channels, height,
+                  width)`).
+                * List of 4D NumPy arrays (expected shape for each array `(num_frames, height, width, num_channels)`).
+                * 5D NumPy arrays: expected shape for each array `(batch_size, num_frames, height, width,
+                  num_channels)`.
+                * 5D Torch tensors: expected shape for each array `(batch_size, num_frames, num_channels, height,
+                  width)`.
             height (`int`, *optional*, defaults to `None`):
                 The height in preprocessed frames of the video. If `None`, will use the `get_default_height_width()` to
                 get default height.