Fix mixed precision for BART #1121

abheesht17 · 2023-07-09T04:39:05Z

With

policy = keras.mixed_precision.Policy("mixed_float16")
keras.mixed_precision.set_global_policy(policy)

,

this error shows up:

TypeError: Exception encountered when calling layer "tf.linalg.matmul" (type TFOpLambda).

Input 'y' of 'BatchMatMulV2' Op has type float32 that does not match type float16 of argument 'x'.

Call arguments received by layer "tf.linalg.matmul" (type TFOpLambda):
  • a=tf.Tensor(shape=(None, None, 768), dtype=float16)
  • b=<tf.Variable 'token_embedding/embeddings:0' shape=(50265, 768) dtype=float32>
  • transpose_a=False
  • transpose_b=True
  • adjoint_a=False
  • adjoint_b=False
  • a_is_sparse=False
  • b_is_sparse=False
  • output_type=None
  • name=None

mattdangerw · 2023-07-10T18:04:54Z

keras_nlp/models/bart/bart_seq_2_seq_lm.py

@@ -193,7 +193,7 @@ def __init__(
        # Use token embedding weights to project from the token representation
        # to vocabulary logits.
        outputs = tf.matmul(
-            x,


I think we actually want this the other way. We should cast the variable to the compute dtype (the lower precision type when using mixed), before multiplying with x. So tf.cast(backbone.token_embedding.embeddings, x.dtype).

Can you check if that fixes things as well?

Hmmm, why does GPT-2 output tf.float32 instead of tf.float16, then?

In what context does it? There might be some casting going on to the outputs.

My understanding is that we should be casting our variables to the compute_dtype. Take a look at
https://github.com/keras-team/keras/blob/v2.12.0/keras/engine/base_layer.py#L2216-L2235
and https://github.com/keras-team/keras/blob/v2.12.0/keras/mixed_precision/autocast_variable.py

But maybe I am missing something!

GPT2 backbone outputs tf.float32. BART backbone outputs tf.float16. Weird why that's happening.

https://colab.research.google.com/drive/18AbKIwbUAtJySgAYWXa0ggvRbp5TriEa?usp=sharing

This actually may no longer be necessary with the port. We had to wrap things in a Layer to avoid some errors with keras-core, which might mean that variables are now autocasting automatically (I think it should).

Can you try again on the latest from master?

mattdangerw · 2023-07-14T17:08:39Z

We think this is no longer relevant with the ReverseEmbedding layer. Closing, can reopen if more issue.

Fix mixed precision for BART

ecaa652

abheesht17 requested a review from mattdangerw July 9, 2023 04:39

abheesht17 added 2 commits July 9, 2023 10:09

Merge branch 'keras-team:master' into fix-bart-mp

0c9aca6

Dunno what to call this

31e6c09

abheesht17 force-pushed the fix-bart-mp branch from 4ebabfd to 31e6c09 Compare July 9, 2023 04:56

Fix generate

2d58a0a

mattdangerw requested changes Jul 10, 2023

View reviewed changes

mattdangerw closed this Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix mixed precision for BART #1121

Fix mixed precision for BART #1121

Uh oh!

abheesht17 commented Jul 9, 2023 •

edited

Loading

Uh oh!

mattdangerw Jul 10, 2023

Uh oh!

abheesht17 Jul 10, 2023

Uh oh!

mattdangerw Jul 10, 2023

Uh oh!

abheesht17 Jul 11, 2023

Uh oh!

mattdangerw Jul 11, 2023

Uh oh!

mattdangerw commented Jul 14, 2023

Uh oh!

Uh oh!

Fix mixed precision for BART #1121

Fix mixed precision for BART #1121

Uh oh!

Conversation

abheesht17 commented Jul 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

abheesht17 Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

abheesht17 Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw commented Jul 14, 2023

Uh oh!

Uh oh!

abheesht17 commented Jul 9, 2023 •

edited

Loading