Feature： Add llava support #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

IntptrMax wants to merge 2 commits into SciSharp:master from IntptrMax:add_llava

IntptrMax commented Mar 5, 2024

This is a simple demo for llava, it can work but still have a long time to work well.
It need llava_shared.dll, you can replace the version for your own PC environment.
It can work on llava 1.5 and llava 1.6, but should set ContextSize at least 3392.
It's better to free the resource afer work, but in this demo it's not good enough now.
Thanks to Rinne, zsogitbe and SignalRT, the demo cann't work without there help.

IntptrMax and others added 2 commits

March 5, 2024 16:54


          Add llava support.

cc1359b


          Merge branch 'SciSharp:master' into add_llava

e9e2b8b

AsakusaRinne requested changes

View reviewed changes

Collaborator

AsakusaRinne left a comment

Thank you for the contribution! At the same time there's also similar work on LLaVA support #563. Maybe @SignalRT and @martindevans would like to look into this PR. Hope that we could gather all the efforts to make llava be well integrated in LLamaSharp.

For me the overall of this PR looks good but there're many details needs to be resolved. If long time is required to finish this PR, you may consider converting it to draft PR.

LLama/Native/NativeApi.cs

+                      /// <param name="n_past"></param>
+                      /// <returns></returns>
+                      [DllImport(llavaLibName, CallingConvention = CallingConvention.Cdecl)]
+                      public extern unsafe static bool llava_eval_image_embed(SafeLLamaContextHandle ctx_llama, LLavaImageEmbed image_embed, int n_batch, ref int n_past);

Collaborator

AsakusaRinne Mar 5, 2024

Please use a separate file such as NativeApi.LLava.cs to add code related to llava only.

LLama/Native/SafeLlavaModelHandle.cs

+                          //int maxTgtLen = 256; /*params->n_predict < 0 ? 256 : params->n_predict;*/
+                          bool addBos = LLamaShouldAddBosToken();
+                          string QuesstionAnsweringPrompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, brief, and polite answers to the human's questions.\nUSER:";

Collaborator

AsakusaRinne Mar 5, 2024

This seems to be a test case, instead of the implementation of eval API.

LLama/Native/SafeLlavaModelHandle.cs

+                      {
+                          int n_tokens = text.Length + (add_bos ? 1 : 0);
+                          LLamaToken[] result = new LLamaToken[n_tokens];
+                          byte[] bytes = Encoding.UTF8.GetBytes(text);

Collaborator

AsakusaRinne Mar 5, 2024

Is the text always utf-8 encoding here? I think we should add encoding as a parameter.

LLama/Native/SafeLlavaModelHandle.cs

+                              if (tmp.Contains("<|im_end|>")) break; // Yi-34B llava-1.6 - for some reason those decode not as the correct token (tokenizer works)
+                              if (tmp.Contains("<|im_start|>")) break; // Yi-34B llava-1.6
+                              if (tmp.Contains("USER:")) break; // mistral llava-1.6
+                              Console.Write(tmp);

Collaborator

AsakusaRinne Mar 5, 2024

Please remove the output or add it with logger.

LLama/Native/SafeLlavaModelHandle.cs

+                          {
+                              string tmp = Sample(samplingContext, ref n_past);
+                              if (tmp == "</s>") break;
+                              if (tmp.Contains("###")) break; // Yi-VL behavior

Collaborator

AsakusaRinne Mar 5, 2024

I fully understand the purpose here to make the generation stop for with models. But we should use inferenceParams.Antiprompts to tell if we should stop generation here.

LLama/Native/SafeLlavaModelHandle.cs

+                          string ret = string.Empty;
+                          if (id == NativeApi.llama_token_eos(this.handle.model))
+                          {
+                              ret = "</s>";

Collaborator

AsakusaRinne Mar 5, 2024

Is this used for all the models? Please avoid hard-coding if it's only a case for a group of models.

LLama/Native/SafeLlavaModelHandle.cs

+                          //           penaltyTokensPtr + penaltyTokens.Length - penalty_tokens_used_size,
+                          //             (ulong)penalty_tokens_used_size, penalty_repeat, penalty_freq, penalty_present);
+                          //    }
+                          //}

Collaborator

AsakusaRinne Mar 5, 2024

Please remove these comments.

LLama/Native/SafeLlavaModelHandle.cs

+                          //    Console.WriteLine(result.ToArray());
+                          //    Console.WriteLine(n_tokens);
+                          //    Console.WriteLine();

Collaborator

AsakusaRinne Mar 5, 2024

Please remove the comments here.

LLama/LLavaContext.cs

+              namespace LLama
+              {
+                  internal class LLavaContext

Collaborator

AsakusaRinne Mar 5, 2024

It seems it's duplicated with LLava/LLavaContext.

AsakusaRinne requested review from martindevans and SignalRT

March 5, 2024 14:02

AsakusaRinne added llava enhancement labels

Member

martindevans commented Mar 6, 2024

How does this PR relate to the llava work that @SignalRT has been doing (e.g. #563)?

Contributor

zsogitbe commented Mar 8, 2024

Thank you for your effort IntptrMax! What martindevans means is that SignalRT is working on a llava implementation which fits well into the current library. Your solution works ok, but it has a different strategy which differs from the strategy used in LlamaSharp. Your code was however useful for learning how llava works!
Maybe it is better to close this PR and keep your codebase in your GitHub as reference.

Author

IntptrMax commented Mar 12, 2024

Thank you for your effort IntptrMax! What martindevans means is that SignalRT is working on a llava implementation which fits well into the current library. Your solution works ok, but it has a different strategy which differs from the strategy used in LlamaSharp. Your code was however useful for learning how llava works! Maybe it is better to close this PR and keep your codebase in your GitHub as reference.

OK，I will close this pr.

IntptrMax closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement llava