Add detailed explanation, examples and caveats of convert_image_dtype function.

hyeygit · tensorflower-gardener · commit f577e5a12832 · 2020-11-14T11:09:40.000-08:00
`convert_image_dtype` is one of the most frequently used functions provided in the tf.image API. However, the API doc significantly lacks in explanation and examples. This constantly causes confusion to users and this CL is for addressing the needs.

PiperOrigin-RevId: 342436750
Change-Id: Ia42d69a280e6bf8194f66cbc0a8e5976bac93187
diff --git a/tensorflow/python/ops/image_ops_impl.py b/tensorflow/python/ops/image_ops_impl.py
@@ -2200,6 +2200,10 @@ def adjust_gamma(image, gamma=1, gain=1):
 def convert_image_dtype(image, dtype, saturate=False, name=None):
   """Convert `image` to `dtype`, scaling its values if needed.
 
+  The operation supports data types (for `image` and `dtype`) of
+  `uint8`, `uint16`, `uint32`, `uint64`, `int8`, `int16`, `int32`, `int64`,
+  `float16`, `float32`, `float64`, `bfloat16`.
+
   Images that are represented using floating point values are expected to have
   values in the range [0,1). Image data stored in integer data types are
   expected to have values in the range `[0,MAX]`, where `MAX` is the largest
@@ -2208,6 +2212,97 @@ def convert_image_dtype(image, dtype, saturate=False, name=None):
   This op converts between data types, scaling the values appropriately before
   casting.
 
+  Usage Example:
+
+  >>> x = [[[1, 2, 3], [4, 5, 6]],
+  ...      [[7, 8, 9], [10, 11, 12]]]
+  >>> x_int8 = tf.convert_to_tensor(x, dtype=tf.int8)
+  >>> tf.image.convert_image_dtype(x_int8, dtype=tf.float16, saturate=False)
+  <tf.Tensor: shape=(2, 2, 3), dtype=float16, numpy=
+  array([[[0.00787, 0.01575, 0.02362],
+          [0.0315 , 0.03937, 0.04724]],
+         [[0.0551 , 0.063  , 0.07086],
+          [0.07874, 0.0866 , 0.0945 ]]], dtype=float16)>
+
+  Converting integer types to floating point types returns normalized floating
+  point values in the range [0, 1); the values are normalized by the `MAX` value
+  of the input dtype. Consider the following two examples:
+
+  >>> a = [[[1], [2]], [[3], [4]]]
+  >>> a_int8 = tf.convert_to_tensor(a, dtype=tf.int8)
+  >>> tf.image.convert_image_dtype(a_int8, dtype=tf.float32)
+  <tf.Tensor: shape=(2, 2, 1), dtype=float32, numpy=
+  array([[[0.00787402],
+          [0.01574803]],
+         [[0.02362205],
+          [0.03149606]]], dtype=float32)>
+
+  >>> a_int32 = tf.convert_to_tensor(a, dtype=tf.int32)
+  >>> tf.image.convert_image_dtype(a_int32, dtype=tf.float32)
+  <tf.Tensor: shape=(2, 2, 1), dtype=float32, numpy=
+  array([[[4.6566129e-10],
+          [9.3132257e-10]],
+         [[1.3969839e-09],
+          [1.8626451e-09]]], dtype=float32)>
+
+  Despite having identical values of `a` and output dtype of `float32`, the
+  outputs differ due to the different input dtypes (`int8` vs. `int32`). This
+  is, again, because the values are normalized by the `MAX` value of the input
+  dtype.
+
+  Note that converting floating point values to integer type may lose precision.
+  In the example below, an image tensor `b` of dtype `float32` is converted to
+  `int8` and back to `float32`. The final output, howeverm is different from
+  the original input `b` due to precision loss.
+
+  >>> b = [[[0.12], [0.34]], [[0.56], [0.78]]]
+  >>> b_float32 = tf.convert_to_tensor(b, dtype=tf.float32)
+  >>> b_int8 = tf.image.convert_image_dtype(b_float32, dtype=tf.int8)
+  >>> tf.image.convert_image_dtype(b_int8, dtype=tf.float32)
+  <tf.Tensor: shape=(2, 2, 1), dtype=float32, numpy=
+  array([[[0.11811024],
+          [0.33858266]],
+         [[0.5590551 ],
+          [0.77952754]]], dtype=float32)>
+
+  Scaling up from an integer type (input dtype) to another integer type (output
+  dtype) will not map input dtype's `MAX` to output dtype's `MAX` but converting
+  back and forth should result in no change. For example, as shown below, the
+  `MAX` value of int8 (=127) is not mapped to the `MAX` value of int16 (=32,767)
+  but, when scaled back, we get the same, original values of `c`.
+
+  >>> c = [[[1], [2]], [[127], [127]]]
+  >>> c_int8 = tf.convert_to_tensor(c, dtype=tf.int8)
+  >>> c_int16 = tf.image.convert_image_dtype(c_int8, dtype=tf.int16)
+  >>> print(c_int16)
+  tf.Tensor(
+  [[[  256]
+    [  512]]
+   [[32512]
+    [32512]]], shape=(2, 2, 1), dtype=int16)
+  >>> c_int8_back = tf.image.convert_image_dtype(c_int16, dtype=tf.int8)
+  >>> print(c_int8_back)
+  tf.Tensor(
+  [[[  1]
+    [  2]]
+   [[127]
+    [127]]], shape=(2, 2, 1), dtype=int8)
+
+  Scaling down from an integer type to another integer type can be a lossy
+  conversion. Notice in the example below that converting `int16` to `uint8` and
+  back to `int16` has lost precision.
+
+  >>> d = [[[1000], [2000]], [[3000], [4000]]]
+  >>> d_int16 = tf.convert_to_tensor(d, dtype=tf.int16)
+  >>> d_uint8 = tf.image.convert_image_dtype(d_int16, dtype=tf.uint8)
+  >>> d_int16_back = tf.image.convert_image_dtype(d_uint8, dtype=tf.int16)
+  >>> print(d_int16_back)
+  tf.Tensor(
+  [[[ 896]
+    [1920]]
+   [[2944]
+    [3968]]], shape=(2, 2, 1), dtype=int16)
+
   Note that converting from floating point inputs to integer types may lead to
   over/underflow problems. Set saturate to `True` to avoid such problem in
   problematic conversions. If enabled, saturation will clip the output into the
@@ -2216,19 +2311,6 @@ def convert_image_dtype(image, dtype, saturate=False, name=None):
   type, and when casting from a signed to an unsigned type; `saturate` has no
   effect on casts between floats, or on casts that increase the type's range).
 
-  Usage Example:
-
-  >>> x = [[[1.0, 2.0, 3.0],
-  ...       [4.0, 5.0, 6.0]],
-  ...     [[7.0, 8.0, 9.0],
-  ...       [10.0, 11.0, 12.0]]]
-  >>> tf.image.convert_image_dtype(x, dtype=tf.float16, saturate=False)
-  <tf.Tensor: shape=(2, 2, 3), dtype=float16, numpy=
-  array([[[ 1.,  2.,  3.],
-          [ 4.,  5.,  6.]],
-         [[ 7.,  8.,  9.],
-          [10., 11., 12.]]], dtype=float16)>
-
   Args:
     image: An image.
     dtype: A `DType` to convert `image` to.