Skip to content

Failed to eval - potential bug but hard to diagnose #422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kaotic3 opened this issue Jan 8, 2024 · 7 comments
Open

Failed to eval - potential bug but hard to diagnose #422

Kaotic3 opened this issue Jan 8, 2024 · 7 comments
Labels
bug Something isn't working stale Stale issue will be autoclosed soon

Comments

@Kaotic3
Copy link
Contributor

Kaotic3 commented Jan 8, 2024

I used the Grammar Example and created a little program for myself where I iterate over text in a file and ask a question of that text. Please note I have anonymised some of it so file paths might not make sense in the below...

This all works pretty well, I am very impressed - just occasionally, I will get a Failed to Eval...and it doesn't really provide enough information to know why it did that. At one point, I thought I had narrowed it down to a Token Size limitation breach - and I do think you get this message if you breach the token limit - but that isn't the only reason?

var gbnf = (await File.ReadAllTextAsync(@"json.gbnf")).Trim();
var grammar = Grammar.Parse(gbnf, "root");
var modelPath = @"mistral-7b-instruct-v0.1.Q5_K_M.gguf";

var parameters = new ModelParams(modelPath)
{
    ContextSize = 4096,
    Seed = 1337,
    GpuLayerCount = 33,
    UseMemoryLock = true,
    UseMemorymap = true,
};

Using MemoryLock / MemoryMap or not doesn't change the fact that this bug appears.

using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);

foreach (var document in cleanedTokenized)
{
	using var grammarInstance = grammar.CreateInstance();
	var inferenceParams = new InferenceParams()
	{
	    Temperature = 0.2f,
	    AntiPrompts = new List<string> { "Question:", "#", "Question: ", ".\n" },
	    MaxTokens = 500,
	    Grammar = grammarInstance,
	};
	var prompt = instruction + " " + document.ExtractedText + " " + question + " Answer: "; // ExtractedText is just plain clean text.
	StringBuilder jsonCollection = new StringBuilder();
	try
	{
	    await foreach (var text in ex.InferAsync(prompt, inferenceParams))
	    {
	        if (text == "")
	        {
	            Console.Write("\n");
	            break;
	        }
	        jsonCollection.Append(text);
	    }
	}
	catch (Exception except)
	{
	    await Console.Out.WriteLineAsync(except.Message + "\n\n");
	}
}

Occasionally I will get a "Failed to eval." this is the output of that error:

LLama.Exceptions.RuntimeError
  HResult=0x80131500
  Message=Failed to eval.
  Source=LLamaSharp
  StackTrace:
   at LLama.LLamaContext.Eval(ReadOnlySpan`1 tokens, Int32 pastTokensCount)
   at LLama.StatelessExecutor.<>c__DisplayClass8_0.<InferAsync>b__0()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at LLama.StatelessExecutor.<InferAsync>d__8.MoveNext()
   at LLama.StatelessExecutor.<InferAsync>d__8.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult(Int16 token)
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at LlamaSharpVersion.Program.<Main>d__0.MoveNext() in \LlamaSharpVersion\Program.cs:line 62
   at LlamaSharpVersion.Program.<Main>d__0.MoveNext() in \LlamaSharpVersion\Program.cs:line 62

Here is an example where I do not think the text has breached the limit but I have received the message. This is 2,809 on a 4,096 Context window model:

image

@martindevans
Copy link
Member

To help debug this, I'd suggest trying to add something into here which logs out the value of ret if it is non-null.

Unfortunately llama.cpp doesn't return very detailed error codes, but that might help us work out the root issue.

e.g.

var ret = NativeApi.llama_eval(this, pinned, tokens.Length, n_past);
if (ret != 0)
    Console.WriteLine($"[DEBUGGING] ret == {ret}");
return ret == 0;

@Kaotic3
Copy link
Contributor Author

Kaotic3 commented Jan 8, 2024

That returned ret == 1.

Not sure if that is any use to you....

I did add some more information, to see if it helped reveal anything...

Console.WriteLine($"[DEBUGGING]: test ret == {ret} {tokens.Length} and {n_past}");

This is it when it succeeds:

[DEBUGGING]: test ret == 0 512 and 1
[DEBUGGING]: test ret == 0 512 and 513
[DEBUGGING]: test ret == 0 321 and 1025
[DEBUGGING]: test ret == 0 1 and 1346
[DEBUGGING]: test ret == 0 1 and 1347
[DEBUGGING]: test ret == 0 1 and 1348
[DEBUGGING]: test ret == 0 1 and 1349
[DEBUGGING]: test ret == 0 1 and 1350
[DEBUGGING]: test ret == 0 1 and 1351
[DEBUGGING]: test ret == 0 1 and 1352
[DEBUGGING]: test ret == 0 1 and 1353
[DEBUGGING]: test ret == 0 1 and 1354
[DEBUGGING]: test ret == 0 1 and 1355
[DEBUGGING]: test ret == 0 1 and 1356
[DEBUGGING]: test ret == 0 1 and 1357
[DEBUGGING]: test ret == 0 1 and 1358
[DEBUGGING]: test ret == 0 1 and 1359
[DEBUGGING]: test ret == 0 1 and 1360
[DEBUGGING]: test ret == 0 1 and 1361
[DEBUGGING]: test ret == 0 1 and 1362
[DEBUGGING]: test ret == 0 1 and 1363
[DEBUGGING]: test ret == 0 1 and 1364
[DEBUGGING]: test ret == 0 1 and 1365
[DEBUGGING]: test ret == 0 1 and 1366
[DEBUGGING]: test ret == 0 1 and 1367
[DEBUGGING]: test ret == 0 1 and 1368
[DEBUGGING]: test ret == 0 1 and 1369
[DEBUGGING]: test ret == 0 1 and 1370
[DEBUGGING]: test ret == 0 1 and 1371
[DEBUGGING]: test ret == 0 1 and 1372
[DEBUGGING]: test ret == 0 1 and 1373
[DEBUGGING]: test ret == 0 1 and 1374
[DEBUGGING]: test ret == 0 1 and 1375
[DEBUGGING]: test ret == 0 1 and 1376
[DEBUGGING]: test ret == 0 1 and 1377
[DEBUGGING]: test ret == 0 1 and 1378
[DEBUGGING]: test ret == 0 1 and 1379
[DEBUGGING]: test ret == 0 1 and 1380
[DEBUGGING]: test ret == 0 1 and 1381

I do think sometimes it fails for the Token Limit, but then I have this failure:

[DEBUGGING]: test ret == 0 512 and 1
[DEBUGGING]: test ret == 0 512 and 513
[DEBUGGING]: test ret == 0 512 and 1025
[DEBUGGING]: test ret == 0 512 and 1537
[DEBUGGING]: test ret == 0 512 and 2049
[DEBUGGING]: test ret == 0 512 and 2561
[DEBUGGING]: test ret == 0 512 and 3073
[DEBUGGING]: test ret == 0 512 and 3585
[DEBUGGING]: test ret == 1 231 and 4097

On the 2,809 token text.

Looking at n_past:

n_past is the number of tokens to use from previous eval calls

I know that my prompt is 230 tokens and my question is about 90 and the response has a limit of 250....this is like 600 at most which only gets us to 3,400.

I know that my "grammar" is another 160 tokens. Which leaves us well under 400 short of the limit.

Not sure why it is doing what it is, but hopefully the above helps.

@martindevans
Copy link
Member

The method to reference for these error code is this one.

Interestingly the docs on that method say:

// return 0 on success
// return positive int on warning
// return negative int on error

So your return value of 1 is presumably a warning on some kind, not actually an error!

The code returning 1 is this:

if (!llama_kv_cache_find_slot(kv_self, batch)) {
    return 1;
}

So it looks like, for some reason, your cache is filling up and so eval fails. I'm not really sure how to debug that further at the moment :/

@martindevans martindevans added the bug Something isn't working label Jan 8, 2024
@Kaotic3
Copy link
Contributor Author

Kaotic3 commented Jan 9, 2024

I had a look at the code and implemented the NativeApi log that you provided in another thread - and I don't see anything being returned on a failure. Not under any level of logging. All you get back is the "Failed to eval" message - I did note that one of the methods returns a Log Error - and one of them does not - and these are the only two ways to get a false result from the static bool llama_kv_cache_find_slot

Not sure whether my logging implementation would have caught the LLAMA_LOG_ERROR that is listed there, I would have hoped so - but nothing shows up. I should make clear, I see Info if I don't filter it, so it does appear to be producing the logs.

Which leads me to think that the problem is perhaps with this method, where the error message is commented out - and thus returns nothing even under the NativeApi logging?

if (n_tested >= n_ctx) {
            //LLAMA_LOG_ERROR("%s: failed to find a slot for %d tokens\n", __func__, n_tokens);
            return false;
        }

I did a bit more work in reporting by getting all the tokens, including the 250 for the response - and seeing where we were when we failed:

image

Which I think demonstrates that we are far short of the Context length sometimes so it probably isn't the first method that is n_tokens > n_ctx.

That is as far as I have got in my digging.

@Kaotic3
Copy link
Contributor Author

Kaotic3 commented Jan 9, 2024

Just to check:

I created the logging like this:

public static void LogMessage(LLamaLogLevel level, string message)
{
    Console.WriteLine($"[{level}] {message}");
}
private LLamaLogCallback? _logCallback;

public void InitializeLlamaLog()
{
    _logCallback = new LLamaLogCallback(LogMessage);
    NativeApi.llama_log_set(_logCallback);
}

Implement it at the start with:

Program pc = new Program(); pc.InitializeLlamaLog();

This only returns INFO logging - I deliberately put in text that was over the Context - still received no error logging. I am assuming that I have implemented the logging incorrectly, but it isn't obvious to me what I have missed.

image

@martindevans
Copy link
Member

if (n_tested >= n_ctx) {

Yep I'm thinking it's this too, this is basically as far as I got in my digging.

logging...

As far as I can see your logging setup looks fine and you should be receiving error level events. There's no filtering going on on the C# side of things there.

Copy link

github-actions bot commented May 9, 2025

This issue has been automatically marked as stale due to inactivity. If no further activity occurs, it will be closed in 7 days.

@github-actions github-actions bot added the stale Stale issue will be autoclosed soon label May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Stale issue will be autoclosed soon
Projects
None yet
Development

No branches or pull requests

2 participants