Closed
Description
Hello
for quite a while I have tried to train my Model with keras3 and the TF backend (TF2.18) using distributed training and the MirroredStrategy. Not being able to run the training with Model.fit successfully, I turned to try this with one of the examples of the keras documentation. The train examples run fine under keras2/TF2.18 and MirroredStrategy for 1 and 2 devices, and it runs fine as well for Keras 3 with one device. For running this with 2 devices the fit function fails within this error under TF2.17
132 to_union_indices = tf.gather(indices_indices, union_indices)
133 values_with_leading_zeros = tf.concat(
--> 134 [tf.zeros((1,) + values.shape[1:], values.dtype), values], axis=0
135 )
136 return tf.gather(values_with_leading_zeros, to_union_indices)
ValueError: Cannot convert a partially known TensorShape (1, None) to a Tensor.
and with this error under TF2.18
Epoch 1/2
Traceback (most recent call last):
File "/Users/roebel/pysrc/FSQCodec/./scripts/test_mirrorstrategy_func.py", line 90, in <module>
model.fit(
File "/u/formes/share/packages/manaconda3/envs/tf2.18/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/u/formes/share/packages/manaconda3/envs/tf2.18/lib/python3.10/site-packages/keras/src/backend/tensorflow/core.py", line 141, in convert_to_tensor
return tf.convert_to_tensor(x, dtype=dtype)
ValueError: None values not supported.
I've put the code with result for running under keras2 here and the version configured for running with keras 3 here.
Many thanks for your help.