This is a modified Hello Wordsmith package using a local model instead of OpenAI. Since I don't have an OpenAI subscription, this fork will allow to execute the Hello Wordsmith project using a model installed locally through Ollama.
This is a simple wrapper around the llama-index CLI
project with some opinionated defaults. We aim to provide a "Hello World" experience using Retrieval-Augmented Generation (RAG). For more context on what RAG is, tradeoffs and, and a detailed walthrough of this project, see this The Pragmatic Engineer article.
For detailed information about the llamaindex-rag
project, visit the official documentation.
- Ollama running a model locally in your machine. In the Ollama Github repo you can find all the information you need to run a model in your local machine.
- For better performance,
llama3.2
is recommended overllama2
(which can be quite slow). - Python 3.9 installed (e.g.,
brew install [email protected]
on macOS) - Hardware requirements:
- Minimum: 8GB RAM, modern CPU
- Recommended: 16GB+ RAM, recent CPU or GPU for faster inference. It worked fine on a MacBook Pro 16 M3 Pro with 36GB
Follow these steps to install and set up your environment:
pip install git+https://github.com/more-carlos/hello-wordsmith -q
Note: It's best practice to work in a virtual Python environment, as opposed to your system's default Python installation. Popular solutions include venv
, conda
, and pipenv
. If you do use your system Python, make sure the bin dir is on your PATH, e.g. export PATH="/Library/Frameworks/Python.framework/Versions/3.x/bin:${PATH}"
If you prefer using Poetry for environment management:
poetry env use $(which python3.9)
to create an environmentpoetry env activate
to activate the environmentpoetry install
to install dependencies
hello-wordsmith
// Launch an interactive chat.hello-wordsmith -q 'What is article III about?'
// Single question and answerhello-wordsmith -f "./my_directory/*" --chunk-size 256 --chunk-overlap 128
// Ingest and index your own data to query with custom document chunk sizes and overlapshello-wordsmith --clear
// Clear stored data

- Checking if Ollama is running: Before using hello-wordsmith, verify that Ollama is running with:
curl http://localhost:11434/api/tags
- Timeout errors: If you encounter timeout errors, try using a smaller model like
llama3.2
instead of larger models - OpenAI errors: If you see errors related to OpenAI API, ensure environment variables are correctly set (this should be handled automatically by this fork)
- Dependency issues: If you encounter dependency conflicts, try recreating your environment or using the exact versions specified in the pyproject.toml file
As you can see, this repo is an extremely simplistic first step towards building a RAG system on your data. You can open up these files and explore how changing parameters like chunk size, or the embedding model that we use, can influence the performance of the system.
As I am a little bit rusty with Python, I used this project to test Cursor, an AI-powered code editor. Cursor significantly improved the development process by providing intelligent code suggestions, helping with debugging and fixing issues. It was a little messy with some dependencies issues, but in general the experience was quite positive