Skip to content
This repository was archived by the owner on Jul 26, 2019. It is now read-only.
This repository was archived by the owner on Jul 26, 2019. It is now read-only.

error of sparse_categorical_crossentropy when using theano backend #7

@HighCWu

Description

@HighCWu

It's totally no problem when using tensorflow backend.
Now I test the theano.
When running train_model of tutorial.ipynb,we get 1d~2d tensor but not Tensortype(float32,3D) error from
T.nnet.softmax() of K.sparse_categorical_crossentropy

<ipython-input-22-27837df85ad1> in classification_loss(y_true, y_pred)
      2 import keras.backend as K
      3 def classification_loss(y_true, y_pred):
----> 4     return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
      5 train.classification_loss = classification_loss

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in sparse_categorical_crossentropy(target, output, from_logits, axis)
   1788     target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
   1789     target = reshape(target, shape(output))
-> 1790     return categorical_crossentropy(target, output, from_logits, axis=-1)
   1791 
   1792 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in categorical_crossentropy(target, output, from_logits, axis)
   1762         target = permute_dimensions(target, permutation)
   1763     if from_logits:
-> 1764         output = T.nnet.softmax(output)
   1765     else:
   1766         # scale preds so that the class probas of each sample sum to 1

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in softmax(c)
    813     if c.broadcastable[-1]:
    814         warnings.warn("The softmax is applied on a dimension of shape 1, which does not have a semantic meaning.")
--> 815     return softmax_op(c)
    816 
    817 

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in make_node(self, x)
    428                 or x.type.dtype not in tensor.float_dtypes:
    429             raise ValueError('x must be 1-d or 2-d tensor of floats. Got %s' %
--> 430                              x.type)
    431         if x.ndim == 1:
    432             warnings.warn("DEPRECATION: If x is a vector, Softmax will not automatically pad x "

ValueError: x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)

Then I use this to avoid it:

    import keras.backend as K
    _softmax = K.T.nnet.softmax
    def softmax(x):
        if x.ndim == 3:
            d1,d2,d3 = x.shape
            return _softmax(x.reshape((d1*d2,d3))).reshape((d1,d2,d3))
        return _softmax(x)
    K.T.nnet.softmax = softmax

but run

m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
                finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
                finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)

again
we get error:

/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_logits and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 6), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_flatten and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 8, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_gather and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 8, 6), (None, 1)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 1), (None, 8, 2)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_random_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 25), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
Epoch 1/100
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:

IndexError: index 8 is out of bounds for axis 1 with size 6

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-39-7b7276d2ce06> in <module>()
      1 m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
      2                 finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
----> 3                 finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)
      4 # now m is ready to be used!
      5 print(m.inputs)

/content/bert_keras_repo/transformer/train.py in train_model(base_model, is_causal, tasks_meta_data, pretrain_generator, finetune_generator, pretrain_epochs, pretrain_optimizer, pretrain_steps, pretrain_callbacks, finetune_epochs, finetune_optimizer, finetune_steps, finetune_callbacks, verbose, TPUStrategy)
    145 
    146     if pretrain_generator is not None:
--> 147         train_step(True)
    148     if finetune_generator is not None:
    149         train_step(False)

/content/bert_keras_repo/transformer/train.py in train_step(is_pretrain)
    142         _model.fit_generator(_generator, steps_per_epoch=pretrain_steps if is_pretrain else finetune_steps,
    143                              verbose=verbose, callbacks=pretrain_callbacks if is_pretrain else finetune_callbacks,
--> 144                              shuffle=False, epochs=pretrain_epochs if is_pretrain else finetune_epochs)
    145 
    146     if pretrain_generator is not None:

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in __call__(self, inputs)
   1386     def __call__(self, inputs):
   1387         assert isinstance(inputs, (list, tuple))
-> 1388         return self.function(*inputs)
   1389 
   1390 

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    915                     node=self.fn.nodes[self.fn.position_of_error],
    916                     thunk=thunk,
--> 917                     storage_map=getattr(self.fn, 'storage_map', None))
    918             else:
    919                 # old-style linkers raise their own exceptions

/usr/local/lib/python3.6/dist-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    901         try:
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
    905         except Exception:

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    890             # default arguments are stored in the closure of `rval`
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:
    894                     compute_map[o][0] = True

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2337 
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:
   2341             np.add.at(out[0], tuple(inputs[2:]), inputs[1])

IndexError: index 8 is out of bounds for axis 1 with size 6
Apply node that caused the error: AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}(Alloc.0, TensorConstant{1}, ARange{dtype='int64'}.0, Reshape{1}.0)
Toposort index: 315
Inputs types: [TensorType(float32, matrix), TensorType(int8, scalar), TensorType(int64, vector), TensorType(int32, vector)]
Inputs shapes: [(64, 6), (), (64,), (64,)]
Inputs strides: [(24, 4), (), (8,), (4,)]
Inputs values: ['not shown', array(1, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[Reshape{3}(AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "bert_keras_repo/transformer/train.py", line 68, in train_model
    [task_loss_weight, task_target, logits, task_mask])
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/layers/core.py", line 687, in call
    return self.function(inputs, **arguments)
  File "bert_keras_repo/transformer/train.py", line 67, in <lambda>
    task_loss = Lambda(lambda x: x[0] * masked_classification_loss(x[1], x[2], x[3]), name=task.name + '_loss')(
  File "bert_keras_repo/transformer/train.py", line 20, in masked_classification_loss
    return _mask_loss(y_true, y_pred, y_mask, classification_loss)
  File "bert_keras_repo/transformer/train.py", line 11, in _mask_loss
    l = K.switch(y_mask, element_wise_loss(y_true, y_pred), K.zeros_like(y_mask, dtype=K.floatx()))
  File "<ipython-input-22-27837df85ad1>", line 4, in classification_loss
    return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py", line 1788, in sparse_categorical_crossentropy
    target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

It seems it's not my coding bug because I checkout the branch back to that one is before tpu support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions