Introducing the primitive PTransform object – Combine
So far, we have seen three grouping (stateful) transformations: Count, Top, and Max. None of these are actually primitive transformations. A primitive transformation is defined as a transformation that needs direct support from a runner and cannot be executed via other transformations. The Combine object is actually the first primitive PTransform object that we are going to introduce. Beam actually has only five primitive PTransform objects, and we will walk through all of them in this chapter. We call non-primitive PTransform objects composite transformations.
The Combine PTransform object generally performs a reduction operation on a PCollection object. As the name suggests, the transform combines multiple input elements into a single output value per window (Combine.globally) or per key and window (Combine.perKey). This reduction is illustrated by the following figure:
Figure 2.6 –...