-
Notifications
You must be signed in to change notification settings - Fork 283
fix left-padding #2278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix left-padding #2278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right to me, I think we might have introduced some bugs.
keras_hub/src/utils/tensor_utils.py
Outdated
padding_shape = [tf.shape(outputs)[0]] + [1] * (len(outputs.shape) - 1) | ||
padding_shape[axis] = shape[axis] - tf.shape(outputs)[axis] | ||
padding_shape = tf.cast(padding_shape, "int64") | ||
print(padding_shape, pad_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no prints please!
@@ -141,7 +141,7 @@ def check_special_value_type(value, value_name): | |||
self.start_value = start_value | |||
self.end_value = end_value | |||
|
|||
self.pad_value = pad_value | |||
self.pad_value = pad_value or 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have this line for start end packer but not multi-segment packer? what's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have this line for start end packer but not multi-segment packer? what's the difference?
If you delete the test, an error will be reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! But that's not the question. We want the implementation of start end packer and multi segment packer to look similar where we can. Having "none" pad value be different between these layers could lead to subtle bugs for end users. Is there a technical reason why we need this line for StartEndPacker and not MultiSegmentPacker? If this is just for tests, let's rework the tests. Let's try to keep the layers working roughly the same.
@@ -17,7 +17,7 @@ def test_dense_input(self): | |||
sequence_length=5, padding_side="left" | |||
) | |||
output = start_end_packer(input_data) | |||
expected_output = [0, 0, 5, 6, 7] | |||
expected_output = [5, 6, 7, 0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. Why is left padding the same as right padding in this case? The test case before looks correct, this looks wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. Why is left padding the same as right padding in this case? The test case before looks correct, this looks wrong.
No, it was obviously wrong before.
Because left padding is for casualLM. If expected_output = [0, 0, 5, 6, 7], how to generate output?
@@ -40,7 +40,7 @@ def test_dense_2D_input(self): | |||
sequence_length=5, padding_side="left" | |||
) | |||
output = start_end_packer(input_data) | |||
expected_output = [[0, 0, 5, 6, 7]] | |||
expected_output = [[5, 6, 7, 0, 0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, we are showing a lot of right hand side padding for the left padding option. I think we have introduced a bug.
def pad(x, shape, padding_side, pad_value): | ||
if padding_side == "left": | ||
def pad(x, shape, padding_side, pad_value, axis=-1): | ||
if padding_side == "left" and pad_value is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand what we are trying to do here, but I think it is buggy, we should go back to the reverse and to_tensor
approach and avoid this manual padding.
@mattdangerw Another strategy is left-pading to maintain the previous implementation, which we implement by modifying the CausalLMPreprocessor. If you prefer this implementation, please close this pr |
@pass-lin got it, yes if you are hoping to make left padding work for generative inference that is a larger undertaking and needs more consideration. Might be better to start with an end to end prototype of how the whole thing would work for generation. How does a end user specify left padding in a high-level flow? How are we going to make sure position embeddings are correctly computed for input sequences? We don't pass a full input of position ints right now, which would basically break an attempt to correctly left pad during generation. We also have to have to think of API design in a layered way. Having In short, a lot to figure out here for generation. We'd probably want to start with the ability to pass position tensor inputs to our backbones, and maybe further refactor our causal lm implementation so that we wouldn't have to duplicate logic here across a number of models. Not sure we are ready for this. |
We need to confirm it. For casuallm.generate, we should use left-padding. Should we use our current modification method or CausalLMPreprocessor? And is the "casualLM batch generate" task in the plans of the keras-team? If the keras-team plans to implement this function, then I will close this PR, and everything will follow your plans. |
It would be great to expose an option for left padding, but only if we could do it correctly, which would require many other changes (notably the need to pass position tensor input). Probably best to start with a prototype. Keep in mind generation might look fine without position inputs, but would not be correct (and we could not merge).
I don't know what you mean here. Batch generation is already supported. The only difference between right and left padding would be when doing early stopping. In the case where you have multiple sequences left padding would sometimes lead to an earlier stopping condition. What is it you are trying to do end to end? |
if strip_prompt:
outputs = [strip_prompt_function(generate(x), x) for x in inputs]
else:
outputs = [generate(x) for x in inputs]
if self.preprocessor is not None:
outputs = [postprocess(x) for x in outputs] This is the current implementation of the |
There was a problem with the previous left-padding implementation
For example the input is [[1, 2, 3], [1, 2, 3, 4]]. In the case of seq-len = 5 the result should be [[0, 1, 2, 3, 0], [1, 2, 3, 4, 0]]
And the output of the previous implementation is [[0, 0, 1, 2, 3], [0, 1, 2, 3, 4, 0]]
This is the wrong implementation, so we fixed it in this PR