-
Notifications
You must be signed in to change notification settings - Fork 287
Add Bloom Model #1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bloom Model #1382
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!! This is awesome.
Left some initial comments. Adding a test file for the backbone should catch some things.
Just a heads up most everyone will be out for new years, so the next review will probably be next week!
Make sure to run |
This reverts commit 889f204.
checkpoint conversion script worked fine and the model produced output that is close to the huggingface output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more comments on the code.
But maybe more importantly, I am looking into license here. We just integrated with Kaggle https://github.com/keras-team/keras-nlp/releases/tag/v0.7.0, which I believe gives us a way to support the open-RAIL license that bloom weights are release under. But we need to double check this. Hope to have an answer next week!
@mattdangerw about the license. If you follow this link https://huggingface.co/bigscience/bloom#uses, you will find a hyperlink (BLOOM license) which points to this License: https://huggingface.co/spaces/bigscience/license Also I have found these two licenses:
|
check this gist to see model output compared to huggingface after applying requested changes: https://colab.research.google.com/gist/abuelnasr0/22877985ce1a1c9125e8ed46cfc87da2/bloom.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good, and I think we are good to land the architecture here.
Can you do two things?
-
Rebase or merge to the latest changes to see if test are passing again? (we had a keras 3 breakage yesterday)
-
Send me your kaggle username if you have one?
my kaggle username: mohamedabuelnasr |
@abuelnasr0 thanks! Sorry for the delay here, but I think you have been added to a list that will allow you to upload models. I will pull this PR in, then you can proceed roughly as follows...
One other note, the largest models 7b & 176b will require a lot of ram to load, even on a CPU. Feel free to just test the conversion with the smaller models, and we can do the conversion for the larger models on our own compute resources. This is our first time going through this new Kaggle upload flow, so please let us know any feedback! |
@mattdangerw Thanks for the merge and the instructions. I will open the PR and add the models as soon as possible. |
Hi @abuelnasr0 ! Thanks for contributing this model. I'm working on the Falcon model which similar to the Bloom model, uses alibi. I was wondering if you are interested in separating your alibi implementation and making it reusable. |
@SamanehSaadat sure. I will open a PR for it. |
The architecture is done. and the generates output successfully.
remaining two tasks:
once I finish, I will mention.