Skip to content

[BUG]: Offset and length were out of bounds #766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
m0nsky opened this issue May 29, 2024 · 2 comments
Open

[BUG]: Offset and length were out of bounds #766

m0nsky opened this issue May 29, 2024 · 2 comments
Assignees
Labels
stale Stale issue will be autoclosed soon

Comments

@m0nsky
Copy link
Contributor

m0nsky commented May 29, 2024

Description

I'm building a llava application. When the amount of tokens in my initial prompt is bigger than the batch size, the InteractiveExecutor will throw a:

System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
   at System.Collections.Generic.List`1.GetRange(Int32 index, Int32 count)
   at LLama.InteractiveExecutor.InferInternal(IInferenceParams inferenceParams, InferStateArgs args) in C:\RiderProjects\llava_defender\LLama\LLamaInteractExecutor.cs:line 257
   at LLama.StatefulExecutorBase.InferAsync(String text, IInferenceParams inferenceParams, CancellationToken cancellationToken)+MoveNext() in C:\RiderProjects\llava_defender\LLama\LLamaExecutorBase.cs:line 325

When adding a breakpoint to LLamaInteractExecutor line 257, we can observe the following:

relevant breakpoint

My initial prompt is 1067 tokens (I have tokenized it and counted it) and the embed image is at position 1055 (which is somewhere at the end of my prompt), but _embeds only goes to 512 (the batch size).

Reproduction Steps

  • Use the default batch size (512)
  • Use a prompt of 513 tokens

Environment & Configuration

  • Operating system: Windows 10
  • .NET runtime version: 8.0
  • LLamaSharp version: current master
  • CUDA version (if you are using cuda backend): 12
  • CPU & GPU device: 7700K + RTX 3080

Known Workarounds

  • Increase batch size to (length of initial prompt + 1)
@SignalRT SignalRT self-assigned this Jun 1, 2024
@martindevans
Copy link
Member

Since #761 the BatchedExecutor will automatically split work up into multiple batches (so any size prompt can be handled, you just need to call Infer() enough times to process the entire queue of work) and since #770 BatchedExecutor has had LLava support.

Copy link

github-actions bot commented May 1, 2025

This issue has been automatically marked as stale due to inactivity. If no further activity occurs, it will be closed in 7 days.

@github-actions github-actions bot added the stale Stale issue will be autoclosed soon label May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale issue will be autoclosed soon
Projects
None yet
Development

No branches or pull requests

3 participants