Failed to eval - potential bug but hard to diagnose #422

Kaotic3 · 2024-01-08T16:32:00Z

I used the Grammar Example and created a little program for myself where I iterate over text in a file and ask a question of that text. Please note I have anonymised some of it so file paths might not make sense in the below...

This all works pretty well, I am very impressed - just occasionally, I will get a Failed to Eval...and it doesn't really provide enough information to know why it did that. At one point, I thought I had narrowed it down to a Token Size limitation breach - and I do think you get this message if you breach the token limit - but that isn't the only reason?

var gbnf = (await File.ReadAllTextAsync(@"json.gbnf")).Trim();
var grammar = Grammar.Parse(gbnf, "root");
var modelPath = @"mistral-7b-instruct-v0.1.Q5_K_M.gguf";

var parameters = new ModelParams(modelPath)
{
    ContextSize = 4096,
    Seed = 1337,
    GpuLayerCount = 33,
    UseMemoryLock = true,
    UseMemorymap = true,
};

Using MemoryLock / MemoryMap or not doesn't change the fact that this bug appears.

using var model = LLamaWeights.LoadFromFile(parameters);
var ex = new StatelessExecutor(model, parameters);

foreach (var document in cleanedTokenized)
{
	using var grammarInstance = grammar.CreateInstance();
	var inferenceParams = new InferenceParams()
	{
	    Temperature = 0.2f,
	    AntiPrompts = new List<string> { "Question:", "#", "Question: ", ".\n" },
	    MaxTokens = 500,
	    Grammar = grammarInstance,
	};
	var prompt = instruction + " " + document.ExtractedText + " " + question + " Answer: "; // ExtractedText is just plain clean text.
	StringBuilder jsonCollection = new StringBuilder();
	try
	{
	    await foreach (var text in ex.InferAsync(prompt, inferenceParams))
	    {
	        if (text == "")
	        {
	            Console.Write("\n");
	            break;
	        }
	        jsonCollection.Append(text);
	    }
	}
	catch (Exception except)
	{
	    await Console.Out.WriteLineAsync(except.Message + "\n\n");
	}
}

Occasionally I will get a "Failed to eval." this is the output of that error:

LLama.Exceptions.RuntimeError
  HResult=0x80131500
  Message=Failed to eval.
  Source=LLamaSharp
  StackTrace:
   at LLama.LLamaContext.Eval(ReadOnlySpan`1 tokens, Int32 pastTokensCount)
   at LLama.StatelessExecutor.<>c__DisplayClass8_0.<InferAsync>b__0()
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at LLama.StatelessExecutor.<InferAsync>d__8.MoveNext()
   at LLama.StatelessExecutor.<InferAsync>d__8.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult(Int16 token)
   at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
   at LlamaSharpVersion.Program.<Main>d__0.MoveNext() in \LlamaSharpVersion\Program.cs:line 62
   at LlamaSharpVersion.Program.<Main>d__0.MoveNext() in \LlamaSharpVersion\Program.cs:line 62

Here is an example where I do not think the text has breached the limit but I have received the message. This is 2,809 on a 4,096 Context window model:

The text was updated successfully, but these errors were encountered:

martindevans · 2024-01-08T18:36:37Z

To help debug this, I'd suggest trying to add something into here which logs out the value of ret if it is non-null.

Unfortunately llama.cpp doesn't return very detailed error codes, but that might help us work out the root issue.

e.g.

var ret = NativeApi.llama_eval(this, pinned, tokens.Length, n_past);
if (ret != 0)
    Console.WriteLine($"[DEBUGGING] ret == {ret}");
return ret == 0;

Kaotic3 · 2024-01-08T21:07:12Z

That returned ret == 1.

Not sure if that is any use to you....

I did add some more information, to see if it helped reveal anything...

Console.WriteLine($"[DEBUGGING]: test ret == {ret} {tokens.Length} and {n_past}");

This is it when it succeeds:

[DEBUGGING]: test ret == 0 512 and 1
[DEBUGGING]: test ret == 0 512 and 513
[DEBUGGING]: test ret == 0 321 and 1025
[DEBUGGING]: test ret == 0 1 and 1346
[DEBUGGING]: test ret == 0 1 and 1347
[DEBUGGING]: test ret == 0 1 and 1348
[DEBUGGING]: test ret == 0 1 and 1349
[DEBUGGING]: test ret == 0 1 and 1350
[DEBUGGING]: test ret == 0 1 and 1351
[DEBUGGING]: test ret == 0 1 and 1352
[DEBUGGING]: test ret == 0 1 and 1353
[DEBUGGING]: test ret == 0 1 and 1354
[DEBUGGING]: test ret == 0 1 and 1355
[DEBUGGING]: test ret == 0 1 and 1356
[DEBUGGING]: test ret == 0 1 and 1357
[DEBUGGING]: test ret == 0 1 and 1358
[DEBUGGING]: test ret == 0 1 and 1359
[DEBUGGING]: test ret == 0 1 and 1360
[DEBUGGING]: test ret == 0 1 and 1361
[DEBUGGING]: test ret == 0 1 and 1362
[DEBUGGING]: test ret == 0 1 and 1363
[DEBUGGING]: test ret == 0 1 and 1364
[DEBUGGING]: test ret == 0 1 and 1365
[DEBUGGING]: test ret == 0 1 and 1366
[DEBUGGING]: test ret == 0 1 and 1367
[DEBUGGING]: test ret == 0 1 and 1368
[DEBUGGING]: test ret == 0 1 and 1369
[DEBUGGING]: test ret == 0 1 and 1370
[DEBUGGING]: test ret == 0 1 and 1371
[DEBUGGING]: test ret == 0 1 and 1372
[DEBUGGING]: test ret == 0 1 and 1373
[DEBUGGING]: test ret == 0 1 and 1374
[DEBUGGING]: test ret == 0 1 and 1375
[DEBUGGING]: test ret == 0 1 and 1376
[DEBUGGING]: test ret == 0 1 and 1377
[DEBUGGING]: test ret == 0 1 and 1378
[DEBUGGING]: test ret == 0 1 and 1379
[DEBUGGING]: test ret == 0 1 and 1380
[DEBUGGING]: test ret == 0 1 and 1381

I do think sometimes it fails for the Token Limit, but then I have this failure:

[DEBUGGING]: test ret == 0 512 and 1
[DEBUGGING]: test ret == 0 512 and 513
[DEBUGGING]: test ret == 0 512 and 1025
[DEBUGGING]: test ret == 0 512 and 1537
[DEBUGGING]: test ret == 0 512 and 2049
[DEBUGGING]: test ret == 0 512 and 2561
[DEBUGGING]: test ret == 0 512 and 3073
[DEBUGGING]: test ret == 0 512 and 3585
[DEBUGGING]: test ret == 1 231 and 4097

On the 2,809 token text.

Looking at n_past:

n_past is the number of tokens to use from previous eval calls

I know that my prompt is 230 tokens and my question is about 90 and the response has a limit of 250....this is like 600 at most which only gets us to 3,400.

I know that my "grammar" is another 160 tokens. Which leaves us well under 400 short of the limit.

Not sure why it is doing what it is, but hopefully the above helps.

martindevans · 2024-01-08T23:58:11Z

The method to reference for these error code is this one.

Interestingly the docs on that method say:

// return 0 on success
// return positive int on warning
// return negative int on error

So your return value of 1 is presumably a warning on some kind, not actually an error!

The code returning 1 is this:

if (!llama_kv_cache_find_slot(kv_self, batch)) {
    return 1;
}

So it looks like, for some reason, your cache is filling up and so eval fails. I'm not really sure how to debug that further at the moment :/

Kaotic3 · 2024-01-09T11:04:18Z

I had a look at the code and implemented the NativeApi log that you provided in another thread - and I don't see anything being returned on a failure. Not under any level of logging. All you get back is the "Failed to eval" message - I did note that one of the methods returns a Log Error - and one of them does not - and these are the only two ways to get a false result from the static bool llama_kv_cache_find_slot

Not sure whether my logging implementation would have caught the LLAMA_LOG_ERROR that is listed there, I would have hoped so - but nothing shows up. I should make clear, I see Info if I don't filter it, so it does appear to be producing the logs.

Which leads me to think that the problem is perhaps with this method, where the error message is commented out - and thus returns nothing even under the NativeApi logging?

if (n_tested >= n_ctx) {
            //LLAMA_LOG_ERROR("%s: failed to find a slot for %d tokens\n", __func__, n_tokens);
            return false;
        }

I did a bit more work in reporting by getting all the tokens, including the 250 for the response - and seeing where we were when we failed:

Which I think demonstrates that we are far short of the Context length sometimes so it probably isn't the first method that is n_tokens > n_ctx.

That is as far as I have got in my digging.

Kaotic3 · 2024-01-09T11:26:01Z

Just to check:

I created the logging like this:

public static void LogMessage(LLamaLogLevel level, string message)
{
    Console.WriteLine($"[{level}] {message}");
}
private LLamaLogCallback? _logCallback;

public void InitializeLlamaLog()
{
    _logCallback = new LLamaLogCallback(LogMessage);
    NativeApi.llama_log_set(_logCallback);
}

Implement it at the start with:

Program pc = new Program(); pc.InitializeLlamaLog();

This only returns INFO logging - I deliberately put in text that was over the Context - still received no error logging. I am assuming that I have implemented the logging incorrectly, but it isn't obvious to me what I have missed.

martindevans · 2024-01-09T17:24:17Z

if (n_tested >= n_ctx) {

Yep I'm thinking it's this too, this is basically as far as I got in my digging.

logging...

As far as I can see your logging setup looks fine and you should be receiving error level events. There's no filtering going on on the C# side of things there.

github-actions · 2025-05-09T00:31:59Z

This issue has been automatically marked as stale due to inactivity. If no further activity occurs, it will be closed in 7 days.

martindevans added the bug Something isn't working label Jan 8, 2024

github-actions bot added the stale Stale issue will be autoclosed soon label May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to eval - potential bug but hard to diagnose #422

Failed to eval - potential bug but hard to diagnose #422

Kaotic3 commented Jan 8, 2024

martindevans commented Jan 8, 2024

Kaotic3 commented Jan 8, 2024 •

edited

Loading

martindevans commented Jan 8, 2024

Kaotic3 commented Jan 9, 2024

Kaotic3 commented Jan 9, 2024

martindevans commented Jan 9, 2024

github-actions bot commented May 9, 2025

Failed to eval - potential bug but hard to diagnose #422

Failed to eval - potential bug but hard to diagnose #422

Comments

Kaotic3 commented Jan 8, 2024

martindevans commented Jan 8, 2024

Kaotic3 commented Jan 8, 2024 • edited Loading

martindevans commented Jan 8, 2024

Kaotic3 commented Jan 9, 2024

Kaotic3 commented Jan 9, 2024

martindevans commented Jan 9, 2024

github-actions bot commented May 9, 2025

Kaotic3 commented Jan 8, 2024 •

edited

Loading