-
Notifications
You must be signed in to change notification settings - Fork 423
Add a best practice example for RAG #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm trying to make basically exactly this right now. I got the I'd like to verify an assumption as well: "When combining text generation and RAG in one application 3 model instances are needed, one for generating embeddings, one for retrieval generation and one for text generation". I feel like those last two instances could be one but I don't know if this to be possible because when creating KernelMemory a seperate model is instantiated. I'm currently have this using LLama.Native;
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.FileSystem.DevTools;
using Microsoft.KernelMemory.MemoryStorage.DevTools;
string nativePath = "<path to native llama>";
NativeLibraryConfig.Instance.WithLibrary(nativePath, null);
string generationModelPath = "<path to any LLM in GGUF format>";
string embeddingModelPath = "<path to any embedding model in GGUF format>";
string storageFolder = "<path to storage folder>";
var llamaGenerationConfig = new LLamaSharpConfig(generationModelPath);
var llamaEmbeddingConfig = new LLamaSharpConfig(embeddingModelPath);
var vectorDbConfig = new SimpleVectorDbConfig() { Directory = storageFolder, StorageType = FileSystemTypes.Disk };
var memory = new KernelMemoryBuilder()
.WithLLamaSharpTextGeneration(llamaGenerationConfig)
.WithLLamaSharpTextEmbeddingGeneration(llamaEmbeddingConfig)
.WithSimpleVectorDb(vectorDbConfig)
.Build();
Console.WriteLine("\n================== INGESTION ==================\n");
Console.WriteLine("Uploading text about E=mc^2");
await memory.ImportTextAsync("""
In physics, mass–energy equivalence is the relationship between mass and energy
in a system's rest frame, where the two quantities differ only by a multiplicative
constant and the units of measurement. The principle is described by the physicist
Albert Einstein's formula: E = m*c^2
""");
Console.WriteLine("Uploading article file about Carbon");
await memory.ImportDocumentAsync("wikipedia.txt");
Console.WriteLine("\n================== RETRIEVAL ==================\n");
var question = "What's E = m*c^2?";
Console.WriteLine($"Question: {question}");
var answer = await memory.AskAsync(question);
Console.WriteLine($"\nAnswer: {answer.Result}\n\n Sources:\n");
// Show sources / citations
foreach (var x in answer.RelevantSources)
{
Console.WriteLine(x.SourceUrl != null
? $" - {x.SourceUrl} [{x.Partitions.First().LastUpdate:D}]"
: $" - {x.SourceName} - {x.Link} [{x.Partitions.First().LastUpdate:D}]");
} I adapted this from this example on KernelMemory from Microsoft. But its current answer to everything is:
Edit: I fixed this by removing the minRelevance parameter from AskAsync() |
I agree that 3 models are needed, however I think the second one is actually not necessary to be a LLM. It could be an algorithm to find similarity of embeddings. Therefore the last two models is less likely to be merged into one. TBH I'm not an expert of RAG, either. I think you will get a much better answer if you ask this question in Thank you a lot for looking into this issue! |
I did actually manage to figure this out with |
using LLama;
using LLama.Common;
using LLama.Native;
using LLamaSharp.SemanticKernel.TextEmbedding;
using Microsoft.SemanticKernel.Connectors.Sqlite;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Text;
using System.Text;
// Initialize native library before anything else
string llamaPath = Path.GetFullPath("<path to local lib>/libllama.so");
NativeLibraryConfig.Instance.WithLibrary(llamaPath, null);
// Download a document and create embeddings for it
#pragma warning disable SKEXP0050, SKEXP0001, SKEXP0020
var embeddingModelPath = Path.GetFullPath("<path to embed model>/nomic-embed.gguf");
var embeddingParameters = new ModelParams(embeddingModelPath) { ContextSize = 4096, GpuLayerCount = 13, Embeddings = true };
var embeddingWeights = LLamaWeights.LoadFromFile(embeddingParameters);
var embedder = new LLamaEmbedder(embeddingWeights, embeddingParameters);
var service = new LLamaSharpEmbeddingGeneration(embedder);
ISemanticTextMemory memory = new MemoryBuilder()
.WithMemoryStore(await SqliteMemoryStore.ConnectAsync("mydata.db"))
.WithTextEmbeddingGeneration(service)
.Build();
Console.WriteLine("===== INGESTING =====");
IList<string> collections = await memory.GetCollectionsAsync();
string folderPath = Path.GetFullPath("<path to folder>/Embeddings");
string[] files = Directory.GetFiles(folderPath);
string collectionName = "TestCollection";
if (collections.Contains(collectionName))
{
Console.WriteLine("Found database");
}
else
{
foreach (var item in files.Select((path, index) => new { path, index }))
{
Console.WriteLine($"Ingesting file #{item.index}");
string text = File.ReadAllText(item.path);
var paragraphs = TextChunker.SplitPlainTextParagraphs(TextChunker.SplitPlainTextLines(text, 128), 512);
foreach (var para in paragraphs.Select((text, index) => new { text, index } ))
await memory.SaveInformationAsync(collectionName, para.text, $"Document {item.path}, Paragraph {para.index}");
}
Console.WriteLine("Generated database");
}
Console.WriteLine("===== DONE INGESTING =====");
StringBuilder builder = new();
Console.Write("Question: ");
string question = Console.ReadLine()!;
builder.Clear();
Console.WriteLine("===== RETRIEVING =====");
List<string> sources = [];
await foreach (var result in memory.SearchAsync(collectionName, question, limit: 1, minRelevanceScore: 0))
{
builder.AppendLine(result.Metadata.Text);
sources.Add(result.Metadata.Id);
}
builder.AppendLine("""
Sources:
""");
foreach (string source in sources)
{
builder.AppendLine($" {source}");
}
Console.WriteLine("===== DONE RETRIEVING =====");
Console.WriteLine(builder.ToString());
#pragma warning restore SKEXP0001, SKEXP0050, SKEXP0020 We have to supress some warnings here because semantic memory is technically considered experimental. This just uses LLamaSharp to generate embeddings and allows us to search anything compatible with Some things to consider is that this is generally the fist step of RAG and there are a lot of steps you can add in between this and adding it to the prompt. Such as returning multiple sources and reranking them, summirzation and so on. I'll leave some helpful resources as well: |
The example looks good. @xbotter Do you have any idea about further improve it? |
This issue has been automatically marked as stale due to inactivity. If no further activity occurs, it will be closed in 7 days. |
A better example with guide is needed for RAG. It could be considered with the following aspects.
The text was updated successfully, but these errors were encountered: