Move generate compilation to the task model #804

mattdangerw · 2023-03-04T03:33:08Z

This demos a fix for #779 by moving the compilation up to the causal_lm model, where we can most easily control the conditions for recompilation.

This has a few advantages:

We only need to tokenize once.
All forward passes on the model, including cache seeding, can live in the compiled function.
We expose compilation in a similar way to keras.Model train step. There is a overridable make_generate_function (similar to make_train_function) and a accessible model.generate_function property on the model.

Demo -> https://colab.research.google.com/gist/mattdangerw/ea205181ef56d1d95860e8b3f4a9db4d/generate-compile-demo.ipynb

mattdangerw · 2023-03-07T22:33:15Z

One major alternative we could consider is moving the sampler to the compile() function. Then things become super parallel to how fit() and predict() work.

gpt_lm = keras_nlp.models.GPT2CausalLM.from_preset(...)
# First call compiles generate with default sampler.
gpt_lm.generate(prompt, length)
# Recompile by passing a new sampler to `compile()`.
gpt_lm.compile(sampler=keras_nlp.samplers.BeamSampler(num_beams=10))
# Next call will remake the generate function saved on the model.
gpt_lm.generate(prompt, length)

Edit: we are going with this approach.

chenmoneygithub · 2023-03-10T01:15:20Z

@mattdangerw Thanks Matt! I took a quick pass, and at a high level this looks okay to me. I will need to dig into the details more to think about different use cases, will do it next week after I get back, thanks!

keras_nlp/models/gpt2/gpt2_causal_lm.py

fchollet · 2023-03-10T01:24:42Z

keras_nlp/samplers/top_k_sampler.py


-    def get_next_token(self, next_token_probs):
+    def get_next_token(self, probs):


Generic feedback -- use fully spelled out argument names for anything that isn't a super well established convention. Probabilities / logits here (which one is it?)

This is a good comment. I wonder if the thing to do is to remove the from_logits=True argument in our samplers. Then things get really obvious. You are either working with logits or probabilities and never both. Sampling is so complex I think anything we can do to remove cognitive load is worth it.

This would also have the advantage of making it really easy to add a temperature argument to all our samplers which scales the logits pre-softmax. (and tightens or loosens the distribution)

Added a commit trying this out.

This demos a fix for keras-team#779 by moving the compilation up to the causal_lm model, where we can most easily control the conditions for recompilation. This has a few advantages: - We only need to tokenize once. - All forward passes on the model, including cache seeding, can live in the compiled function. - We expose compilation in a similar way to `keras.Model` train step. There is a overridable make_generate_function (similar to make_train_function) and a accessible `model.generate_function` property on the model.

keras_nlp/models/gpt2/gpt2_causal_lm.py

chenmoneygithub

Thanks! Finally took a full pass and dropped some comments.

Overall looks good, and beam search is impressively clean! 2 main comments are:

The state variable has lots of freedom, and seems the main usage is still cache.
We are forcing users to clear the self.generate_function via compile(), which is a strong contract. As a comparison, I can adjust model.optimizer by directly setting the field, or I can use custom training loop as an alternative. But generate() is tightly coupled with compile().

keras_nlp/utils/tf_utils.py

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/sampler.py

chenmoneygithub · 2023-03-14T05:22:07Z

keras_nlp/samplers/sampler.py

+    prompt: A 2D integer tensor with shape `(batch_size, max_length)`. This
+        tensor will be iteratively updated column by column with new sampled
+        values.
+    state: Optional. A tensor or nested structure of tensors that will be


Okay... I finally understand how this state works in general. Correct me if I am wrong, sharing my understanding - this state is a free variable, how to use it is totally decided by users of samplers. Most often this is to hold cache, but it is in fact a backdoor people can utilize in the next function or when they want to override __call__ method.

My thought is "state" is too broad for users to learn, also I am not clear on how to let it hold >=2 things, e.g., in contrastive search we need both cache and previous logits. As we cannot assume the existence of cache, how do we retrieve the previous logits robustly?

Contrastive search is not covered by this case. IIUC contrastive search does not need a cache per say, or previous logits. What contrastive search needs is the hidden representation of every token to compute a cosine similarity metric. So to make contrastive search work, we will need to update the signature of next to something like this:

def next(prompt, state, index): return logits, dense, state

I can take a closer look at contrastive search implementation to understand what is out there. I was thinking to cover it as a follow up, but definitely worth some thought.

I also think we might want to pull apart the user journeys here.

Someone writing a sampler is writing a drop in replacement for sampler="beam" or sampler="greedy". state is not an important backdoor in this case, it's just more tensor variables that should be treated as loop variables. But from the perspective of the sampler writer, the model and it's forward pass are a black box.

In contrast, from the perspective of a model writer, the sampler should be a black box, where the user complies with the next contract and doesn't worry about how the sampling actually happens. Here, state can be used for introduction of arbitrary extra updatable state needed solely to compute the probability distribution of the next token. This could be the cache for a transformer decoder, the hidden state of a recurrent network, etc.

Overall, I am most interested in keeping the sampler simple and useful for our own purposes, but worth chatting through all this!

chenmoneygithub

Thanks! Took another pass, the functionality is good to me, dropped some comments on style. One thing we may want to do is to compare the performance before and after this PR, since now at head we don't have recompilation either.

keras_nlp/models/gpt2/gpt2_causal_lm.py

chenmoneygithub · 2023-03-19T10:25:10Z

keras_nlp/models/gpt2/gpt2_causal_lm.py

-        )
+        # Pad ragged to dense tensors.
+        padded_shape = (None, max_length)
+        min_length = tf.cast(tf.reduce_min(prompt.row_lengths()), "int32")


Why do we need an explicit dtyple int32 here?

I moved this down into sampler for now, but essentially by default tf.shape/tf.range use int32, and tf.ragged row lengths defaults to int64.

This is slightly awkward, and we need to cast to make sure our index comparisons are all using the same type.

keras_nlp/models/gpt2/gpt2_causal_lm.py

keras_nlp/samplers/beam_sampler.py

keras_nlp/samplers/top_k_sampler_test.py

mattdangerw · 2023-03-20T20:37:46Z

One thing we may want to do is to compare the performance before and after this PR, since now at head we don't have recompilation either.

Yeah, I have been doing some rough bench-marking throughout developing this branch. Here's a rough breakdown.

Test with batch_size=2, max_length=256, num_trials=25, so ~12800 tokens generated. 3090 GPU.

On master branch: 22s
This branch: 17s
Huggingface with XLA: 14s

I have some ideas on where we can keep cutting our performance gap down, but this is a definite speed up over master (moving the cache seeding into the compiled function is important).

mattdangerw · 2023-03-20T21:30:39Z

Comments addressed.

chenmoneygithub

Thanks, approved!

* Move compilation to the task model This demos a fix for keras-team#779 by moving the compilation up to the causal_lm model, where we can most easily control the conditions for recompilation. This has a few advantages: - We only need to tokenize once. - All forward passes on the model, including cache seeding, can live in the compiled function. - We expose compilation in a similar way to `keras.Model` train step. There is a overridable make_generate_function (similar to make_train_function) and a accessible `model.generate_function` property on the model. * Fix beam search for new interface * Fix tests and docstrings * Minor fixups * Remove from_logits; clarify logits vs probabilities * Readability fixes * Minor fixes * Fix test failures * Address comments * Address comments

mattdangerw changed the title ~~Move compilation to the task model~~ Move generate compilation to the task model Mar 8, 2023

mattdangerw force-pushed the move-compilation-for-generate branch from a2266bf to ba982e2 Compare March 10, 2023 00:07

mattdangerw marked this pull request as ready for review March 10, 2023 00:47

mattdangerw requested review from chenmoneygithub and fchollet March 10, 2023 01:04

mattdangerw force-pushed the move-compilation-for-generate branch from 334b655 to 0f116cd Compare March 10, 2023 01:06

fchollet reviewed Mar 10, 2023

View reviewed changes

mattdangerw mentioned this pull request Mar 10, 2023

Returns All Beams from Beam Search Utility #776

Closed

mattdangerw added 6 commits March 10, 2023 15:30

Fix beam search for new interface

56772b0

Fix tests and docstrings

277f95e

Minor fixups

4a82d88

Remove from_logits; clarify logits vs probabilities

9d8124e

Readability fixes

4e60829

mattdangerw force-pushed the move-compilation-for-generate branch from 3dbe15e to 4e60829 Compare March 10, 2023 23:31

Minor fixes

d885a00

mattdangerw commented Mar 10, 2023

View reviewed changes

keras_nlp/models/gpt2/gpt2_causal_lm.py Show resolved Hide resolved

Fix test failures

9c843f8

chenmoneygithub suggested changes Mar 14, 2023

View reviewed changes

chenmoneygithub mentioned this pull request Mar 15, 2023

Modify generate() method in GPT2CausalLM to support chatbot #846

Closed

Address comments

6685327

mattdangerw mentioned this pull request Mar 17, 2023

Add a GenerativeTask base class #868

Closed

chenmoneygithub suggested changes Mar 19, 2023

View reviewed changes

keras_nlp/samplers/top_k_sampler_test.py Outdated Show resolved Hide resolved

keras_nlp/samplers/top_k_sampler_test.py Outdated Show resolved Hide resolved

Address comments

6304f13

chenmoneygithub approved these changes Mar 21, 2023

View reviewed changes

mattdangerw merged commit c74f9da into keras-team:master Mar 21, 2023

jbischof mentioned this pull request Apr 6, 2023

Update docstrings for relocated sampler arg #964

Merged


		def get_next_token(self, next_token_probs):
		def get_next_token(self, probs):

Move generate compilation to the task model #804

Move generate compilation to the task model #804

Uh oh!

Conversation

mattdangerw commented Mar 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdangerw commented Mar 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenmoneygithub commented Mar 10, 2023

Uh oh!

Uh oh!

Uh oh!

fchollet Mar 10, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 10, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenmoneygithub Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenmoneygithub Mar 19, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattdangerw commented Mar 20, 2023

Uh oh!

mattdangerw commented Mar 20, 2023

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattdangerw commented Mar 4, 2023 •

edited

Loading

mattdangerw commented Mar 7, 2023 •

edited

Loading

mattdangerw Mar 14, 2023 •

edited

Loading

mattdangerw Mar 20, 2023 •

edited

Loading