Skip to content

Added an Example for BLEU. #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

Added an Example for BLEU. #799

wants to merge 11 commits into from

Conversation

Neeshamraghav012
Copy link
Contributor

I would like to inform you that I have made some updates to the content. Firstly, I have added an example for BLEU which was not present before. This new example serves as an illustration of the concept and should be helpful for those who are new to this topic.

Furthermore, I have also fixed the URLs by wrapping them in markdown. This should make it easier for readers to access the resources mentioned in the content.

I have made some changes in the docstring example of the TokenAndPosition Embedding Layer as suggested in issue number 658.
I have changed the epsilon value 1e-5 to 1e-12.
I have changed the epsilon value from 1e-5 to 1e-12.
I would like to inform you that I have made some updates to the content. Firstly, I have added an example for BLEU which was not present before. This new example serves as an illustration of the concept and should be helpful for those who are new to this topic.

Furthermore, I have also fixed the URLs by wrapping them in markdown. This should make it easier for readers to access the resources mentioned in the content.
Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments!

Comment on lines 97 to 107
Examples:
```python
bleu = keras_nlp.metrics.Bleu(max_order=4)
# reference sentence
ref_sentence = "the quick brown fox jumps over the lazy dog"
# predicted sentence
pred_sentence = "the quick brown fox jumps over the box"
# compute BLEU score
score = bleu([ref_sentence], [pred_sentence])
print("BLEU score:", score)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, shoot! I must have forgotten to add examples. Instead of fenced doc-strings, can we have examples this way: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/metrics/rouge_l.py#L46-L113?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure!

Comment on lines 66 to 67
penalise short predictions. For more details, see the following article:
https://cloud.google.com/translate/automl/docs/evaluate#bleu.
[Link](https://cloud.google.com/translate/automl/docs/evaluate#bleu).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably change this to

For more details, see [this article](https://cloud.google.com/translate/automl/docs/evaluate#bleu).

Comment on lines 83 to 84
`"tokenizer_13a"` tokenizer
(https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).
[Link](https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, as well, we can probably change this to

...
[SacreBLEU's `"tokenizer_13a"` tokenizer](https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).

@@ -140,7 +140,7 @@ def __init__(
x = keras.layers.LayerNormalization(
name="encoder_embeddings_layer_norm",
axis=-1,
epsilon=1e-5,
epsilon=1e-12,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read an issue about changing the epsilon value, that's why I changed it:
#642
I will update it as you said from 1e-12 to 1e-5.

Changed the epsilon value from 1e-12 to 1e-5.
Made the changes as suggested by the reviewer.
Copy link
Collaborator

@abheesht17 abheesht17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @Neeshamraghav012! A few minor comments, and one major comment about adding more examples.

Oh, and please run the code formatters (./shell/format.sh).

Comment on lines 96 to 103
Examples:

1. Calculate BLEU score by calling Bleu directly.
>>> bleu = keras_nlp.metrics.Bleu(max_order=4)
>>> ref_sentence = "the quick brown fox jumps over the lazy dog"
>>> pred_sentence = "the quick brown fox jumps over the box"
>>> score = bleu([ref_sentence], [pred_sentence])
<tf.Tensor(0.7420885, shape=(), dtype=float32)>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the extra indentation, it can be in line with "Examples:!

**kwargs: Other keyword arguments.

**kwargs: Other keyword arguments.
Examples:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add more examples, which cover the various input types and input shapes. Check this for reference: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/metrics/rouge_l.py#L46.

We also need to demonstrate that we can pass multiple references for the same sample, i.e., (references, translation) pair. We need a separate example for this.

@@ -92,8 +92,15 @@ class Bleu(keras.metrics.Metric):
dtype: string or tf.dtypes.Dtype. Precision of metric computation. If
not specified, it defaults to tf.float32.
name: string. Name of the metric instance.
**kwargs: Other keyword arguments.

**kwargs: Other keyword arguments.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove **kwargs...we don't actually document it anywhere in the library. And please add a newline after the last arg!

Comment on lines 83 to 84
`"tokenizer_13a"` tokenizer, see
[tokenizer details](https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).
Copy link
Collaborator

@abheesht17 abheesht17 Mar 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't work?

[SacreBLEU's `"tokenizer_13a"` tokenizer](https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).

I think this looks cleaner, what do you think?

Neeshamraghav012 and others added 3 commits March 6, 2023 20:33
I have added more example as suggested by the reviewer, and made some minor changes as @abheesht17 said.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants