Starting a Llama Stack Server
You can run a Llama Stack server in one of the following ways:
As a Library:
This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See Using Llama Stack as a Library
Container:
Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See Selection of a Distribution for more details.
Kubernetes:
If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See Kubernetes Deployment Guide for more details.
Configure logging
Control log output via environment variables before starting the server.
LLAMA_STACK_LOGGINGsets per-component levels, e.g.LLAMA_STACK_LOGGING=server=debug;core=info.- Supported categories:
all,core,server,router,inference,agents,safety,eval,tools,client. - Levels:
debug,info,warning,error,critical(default isinfo). Useall=<level>to apply globally. LLAMA_STACK_LOG_FILE=/path/to/logmirrors logs to a file while still printing to stdout.
Export these variables prior to running llama stack run, launching a container, or starting the server through any other pathway.
:maxdepth: 1
:hidden:
importing_as_library
configuration