Computer vision with PyTorch
PyTorch provides several convenient functions for computer vision, which includes convolutional layers and pooling layers. PyTorch provides Conv1d, Conv2d, and Conv3d under the torch.nn package. As it sounds, Conv1d handles one-dimensional convolution, while Conv2d works with two-dimensional convolution with inputs like images, and Conv3d operates a three-dimensional convolution on inputs like videos. Obviously, this is confusing since the dimension specified never considered the depth of the input. For instance, Conv2d handles four-dimensional input among which the first dimension would be batch size, the second dimension would be the depth of the image (in RGB channels), and the last two dimensions would be the height and width of the image.
Apart from the higher-layer functions for computer vision, torchvision has some handy utility functions for setting up the network. We'll explore some of those in this chapter.
This chapter explains PyTorch...