Tachikoma - Modern Swift AI SDK

A Modern Swift AI SDK that makes AI integration feel natural

Named after the spider-tank AI from Ghost in the Shell, Tachikoma provides an intelligent, adaptable interface for AI services with a completely modern Swift-native API.

Quick Start

Basic Setup

import Tachikoma  // Single unified module

// Simple text generation
let answer = try await generate("What is 2+2?", using: .openai(.gpt4o))
print(answer) // "4"

// With different models
let response1 = try await generate("Hello", using: .anthropic(.opus4))
let response2 = try await generate("Hello", using: .grok(.grok4))
let response3 = try await generate("Hello", using: .ollama(.llama33))

Conversation Management

// Multi-turn conversations
let conversation = Conversation()
conversation.addUserMessage("Hello!")
let response = try await conversation.continue(using: .claude)
print(response) // Assistant's response

conversation.addUserMessage("Tell me about Swift")
let nextResponse = try await conversation.continue()
// Uses same model as previous call

OpenAI Realtime API (Voice Conversations) 🎙️

COMPLETE: Full implementation of OpenAI's Realtime API for ultra-low-latency voice conversations!

The Realtime API enables natural, real-time voice conversations with GPT-4o, featuring:

~500ms latency - Near-instantaneous voice responses
WebSocket streaming - Persistent bidirectional connection
Native voice I/O - No separate TTS/STR needed
Function calling - Voice-triggered tool execution

Quick Start

// Basic voice conversation
let conversation = try RealtimeConversation(configuration: config)
try await conversation.start(
    model: .gpt4oRealtime,
    voice: .nova,
    instructions: "You are a helpful voice assistant"
)

// Manual turn control
try await conversation.startListening()
// User speaks...
try await conversation.stopListening()

// Handle responses
for await transcript in conversation.transcriptUpdates {
    print("Assistant: \(transcript)")
}

// End conversation
await conversation.end()

Advanced Features

// Full-featured configuration
let config = EnhancedSessionConfiguration(
    model: "gpt-4o-realtime-preview",
    voice: .nova,
    instructions: "You are an expert assistant",
    inputAudioFormat: .pcm16,
    outputAudioFormat: .pcm16,
    inputAudioTranscription: .whisper,  // Transcribe user audio
    turnDetection: RealtimeTurnDetection(
        type: .serverVad,
        threshold: 0.5,
        silenceDurationMs: 200,
        prefixPaddingMs: 300
    ),
    modalities: .all,  // Text and audio
    temperature: 0.8,
    maxResponseOutputTokens: 4096
)

// Production-ready settings
let settings = ConversationSettings(
    autoReconnect: true,
    maxReconnectAttempts: 3,
    reconnectDelay: 2.0,
    bufferWhileDisconnected: true,  // Buffer audio during disconnection
    enableEchoCancellation: true,
    enableNoiseSuppression: true
)

// Create advanced conversation manager
let conversation = try AdvancedRealtimeConversation(
    apiKey: apiKey,
    configuration: config,
    settings: settings
)

// Dynamic modality switching
try await conversation.updateModalities(.text)   // Text-only mode
try await conversation.updateModalities(.audio)  // Audio-only mode
try await conversation.updateModalities(.all)    // Both modalities

// Update VAD settings on the fly
try await conversation.updateTurnDetection(
    RealtimeTurnDetection.serverVAD
)

Voice-Enabled Function Calling

// Register tools for voice interaction
await conversation.registerTools([
    createTool(
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: [
            AgentToolParameterProperty(
                name: "location",
                type: .string,
                description: "City and state/country"
            )
        ]
    ) { args in
        let location = try args.stringValue("location")
        // Fetch weather data...
        return .string("Sunny, 22°C in \(location)")
    }
])

// Built-in tools available
await conversation.registerBuiltInTools()
// Includes: Weather, Time, Calculator, WebSearch, Translation

// Tools are automatically invoked during voice conversation
// User: "What's the weather in Tokyo?"
// Assistant calls get_weather("Tokyo") and responds with result

Audio Pipeline & Processing

// Custom audio pipeline configuration
let pipeline = try AudioStreamPipeline()
pipeline.delegate = self

// Handle audio events
extension MyController: AudioStreamPipelineDelegate {
    func audioStreamPipeline(_ pipeline: AudioStreamPipeline, 
                           didCaptureAudio data: Data) {
        // Process captured audio (24kHz PCM16)
    }
    
    func audioStreamPipeline(_ pipeline: AudioStreamPipeline, 
                           didUpdateAudioLevel level: Float) {
        // Update UI with audio levels
    }
    
    func audioStreamPipeline(_ pipeline: AudioStreamPipeline, 
                           didDetectSpeechStart: Bool) {
        // Handle speech detection
    }
}

SwiftUI Integration

// Observable conversation for SwiftUI
@StateObject private var conversation = AdvancedRealtimeConversation(
    apiKey: apiKey,
    configuration: .voiceConversation()
)

// Pre-built SwiftUI view
RealtimeConversationView(
    apiKey: apiKey,
    configuration: .voiceConversation(),
    onError: { error in
        print("Conversation error: \(error)")
    }
)

Event System

All 37 Realtime API events are fully supported:

// Client events (9 types)
case sessionUpdate(SessionUpdateEvent)
case inputAudioBufferAppend(InputAudioBufferAppendEvent)
case inputAudioBufferCommit
case inputAudioBufferClear
case conversationItemCreate(ConversationItemCreateEvent)
case conversationItemTruncate(ConversationItemTruncateEvent)
case conversationItemDelete(ConversationItemDeleteEvent)
case responseCreate(ResponseCreateEvent)
case responseCancel

// Server events (28 types)
case sessionCreated(SessionCreatedEvent)
case conversationItemCreated(ConversationItemCreatedEvent)
case inputAudioBufferSpeechStarted
case responseAudioDelta(ResponseAudioDeltaEvent)
case responseFunctionCallArgumentsDone(ResponseFunctionCallArgumentsDoneEvent)
case rateLimitsUpdated(RateLimitsUpdatedEvent)
case error(RealtimeErrorEvent)
// ... and more

Complete Implementation Status ✅

✅ Phase 1: Core WebSocket infrastructure
✅ Phase 2: Event system (37 events)
✅ Phase 3: Audio pipeline (24kHz PCM16)
✅ Phase 4: Function calling
✅ Phase 5: Advanced features (VAD, modalities, auto-reconnect)
✅ Phase 6: Testing & documentation

See docs/openai-harmony.md for complete implementation details.

Tool Integration

// Define tools using the ToolKit protocol
struct WeatherTools: ToolKit {
    var tools: [Tool<WeatherTools>] {
        [
            createTool(
                name: "get_weather",
                description: "Get current weather for a location"
            ) { input, context in
                let location = try input.stringValue("location")
                return try await context.getWeather(location: location)
            }
        ]
    }
    
    func getWeather(location: String) async throws -> String {
        // Your weather API integration here
        return "Sunny, 22°C in \(location)"
    }
}

// Use tools with AI models
let toolkit = WeatherTools()
let result = try await generateText(
    model: .claude,
    messages: [.user("What's the weather in Tokyo?")],
    tools: toolkit.tools.map { $0.toAgentTool() }
)
print(result.text)

Core Features

Type-Safe Model Selection

Compile-time safety with provider-specific enums and full autocomplete support:

// Provider-specific models with compile-time checking
.openai(.gpt4o, .gpt41, .o3, .o3Mini, .custom("ft:gpt-4o:org:abc"))
.anthropic(.opus4, .sonnet4, .haiku35, .opus4Thinking)
.grok(.grok4, .grok40709, .grok2Vision1212)
.ollama(.llama33, .llama32, .llava, .codellama)
.google(.gemini2Flash, .gemini15Pro)
.mistral(.large2, .nemo)
.groq(.llama3170b, .mixtral8x7b)

// Third-party aggregators
.openRouter(modelId: "anthropic/claude-3.5-sonnet")
.together(modelId: "meta-llama/Llama-2-70b-chat-hf")
.replicate(modelId: "meta/llama-2-70b-chat")

// Custom endpoints
.openaiCompatible(modelId: "gpt-4", baseURL: "https://api.azure.com")
.anthropicCompatible(modelId: "claude-3-opus", baseURL: "https://api.custom.ai")

Core Generation Functions

Simple, async-first API following modern Swift patterns:

// Generate text from a prompt
func generate(
    _ prompt: String,
    using model: LanguageModel = .default,
    system: String? = nil,
    maxTokens: Int? = nil,
    temperature: Double? = nil
) async throws -> String

// Stream responses in real-time
func stream(
    _ prompt: String,
    using model: LanguageModel = .default,
    system: String? = nil,
    maxTokens: Int? = nil,
    temperature: Double? = nil
) async throws -> AsyncThrowingStream<TextStreamDelta, Error>

// Analyze images with vision models
func analyze(
    image: ImageInput,
    prompt: String,
    using model: Model? = nil
) async throws -> String

// Advanced generation with full control
func generateText(
    model: LanguageModel,
    messages: [ModelMessage],
    tools: [AgentTool]? = nil,
    settings: GenerationSettings = .default,
    maxSteps: Int = 1
) async throws -> GenerateTextResult

// Generate embeddings (NEW)
func generateEmbedding(
    model: EmbeddingModel,
    input: EmbeddingInput,
    settings: EmbeddingSettings = .default
) async throws -> EmbeddingResult

OpenAI Harmony-Inspired Features ✨

Enhanced capabilities inspired by OpenAI Harmony patterns:

🎭 Multi-Channel Responses - Separate thinking, analysis, and final answers
🧠 Reasoning Effort Levels - Control depth of reasoning (low/medium/high)
🔄 Automatic Retry Handler - Exponential backoff with smart rate limit handling
🔧 Enhanced Tool System - Namespace and recipient support for tool routing
📊 Embeddings API - Unified interface for OpenAI, Cohere, Voyage embeddings
💾 Response Caching - Intelligent caching to reduce API calls

// Multi-channel responses
let result = try await generateText(
    model: .openai(.o3),
    messages: messages,
    settings: GenerationSettings(reasoningEffort: .high)
)

for message in result.messages {
    switch message.channel {
    case .thinking: print("[Reasoning] \(message.content)")
    case .final: print("[Answer] \(message.content)")
    default: break
    }
}

// Automatic retry with exponential backoff
let retryHandler = RetryHandler(policy: .aggressive)
let response = try await retryHandler.execute {
    try await generateText(model: .openai(.gpt4o), messages: messages)
}

// Enhanced tools with namespaces
let tool = AgentTool(
    name: "search",
    description: "Search the web",
    parameters: params,
    namespace: "web",
    recipient: "search-engine",
    execute: { /* ... */ }
)

Realtime API Features

🎙️ WebSocket Streaming - Persistent connection for low-latency (~500ms) voice conversations
🔊 Bidirectional Audio - Real-time audio input and output with PCM16 format
🎯 Server VAD - Automatic Voice Activity Detection for natural turn-taking
🔄 Dynamic Modalities - Switch between text/audio/both during conversation
🛠️ Voice-Triggered Tools - Execute functions via voice commands
🔁 Auto-Reconnect - Automatic reconnection with exponential backoff
💾 Audio Buffering - Buffer audio during disconnection for seamless recovery
📱 SwiftUI Ready - Observable objects with @Published properties

See docs/openai-harmony.md for complete Realtime API documentation.

Conversation Management

Fluent conversation API with SwiftUI integration:

@available(macOS 13.0, iOS 16.0, watchOS 9.0, tvOS 16.0, *)
public final class Conversation: ObservableObject {
    @Published public private(set) var messages: [ConversationMessage] = []
    
    public func addUserMessage(_ content: String)
    public func addAssistantMessage(_ content: String)
    public func continue(using model: Model? = nil, tools: (any ToolKit)? = nil) async throws -> String
    public func clear()
}

// Usage in SwiftUI
struct ChatView: View {
    @StateObject private var conversation = Conversation()
    
    var body: some View {
        // Your chat UI here
    }
}

Tool System with ToolKit Protocol

Type-safe function calling with structured tool definitions:

// Define tools using the protocol
struct MathToolKit: ToolKit {
    var tools: [Tool<MathToolKit>] {
        [
            createTool(
                name: "calculate",
                description: "Perform mathematical calculations"
            ) { input, context in
                let expression = try input.stringValue("expression")
                return try context.calculate(expression)
            }
        ]
    }
    
    func calculate(_ expression: String) throws -> String {
        // Your calculation logic
        let expr = NSExpression(format: expression)
        let result = expr.expressionValue(with: nil, context: nil) as! NSNumber
        return "\(result.doubleValue)"
    }
}

// Use with AI models
let toolkit = MathToolKit()
let result = try await generateText(
    model: .claude,
    messages: [.user("What is 15 * 23?")],
    tools: toolkit.tools.map { $0.toAgentTool() }
)

Architecture

System Overview

                            Tachikoma Swift AI SDK
                                     |
  ┌─────────────────────────────────────────────────────────────────┐
  │                      Public API Layer                          │
  │  generate() • stream() • analyze() • Conversation • ToolKit    │
  └─────────────────────────────────────────────────────────────────┘
                                     |
  ┌─────────────────────────────────────────────────────────────────┐
  │                   Model Selection System                       │
  │    LanguageModel.openai(.gpt4o) • .anthropic(.opus4)          │
  │    .grok(.grok4) • .ollama(.llama33) • .google(.gemini2Flash) │
  └─────────────────────────────────────────────────────────────────┘
                                     |
  ┌─────────────────────────────────────────────────────────────────┐
  │                  Provider Implementations                      │
  │  OpenAI • Anthropic • Grok (xAI) • Ollama • Google • Mistral  │
  │  Groq • OpenRouter • Together • Replicate • Custom Endpoints  │
  └─────────────────────────────────────────────────────────────────┘

Package Structure

Tachikoma is a unified single module containing all functionality:

Tachikoma - Complete AI SDK with all features in one module:
- Core generation functions and model system
- Tool system with ToolKit protocol and examples
- Command-line utilities and model parsing
- Conversation management and provider interfaces
- Audio functions and usage tracking

Core Components

1. Type-Safe Model System (`Model.swift`)

The LanguageModel enum provides compile-time safety and autocomplete for all supported AI providers:

public enum LanguageModel: Sendable, CustomStringConvertible {
    // Major providers with sub-enums
    case openai(OpenAI)      // .gpt4o, .gpt41, .o3, .o3Mini, .o4Mini
    case anthropic(Anthropic) // .opus4, .sonnet4, .haiku35, .opus4Thinking
    case grok(Grok)          // .grok4, .grok40709, .grok2Vision1212
    case ollama(Ollama)      // .llama33, .llama32, .llava, .codellama
    case google(Google)      // .gemini2Flash, .gemini15Pro
    case mistral(Mistral)    // .large2, .nemo, .codestral
    case groq(Groq)          // .llama3170b, .mixtral8x7b
    
    // Third-party aggregators
    case openRouter(modelId: String)
    case together(modelId: String)
    case replicate(modelId: String)
    
    // Custom endpoints
    case openaiCompatible(modelId: String, baseURL: String)
    case anthropicCompatible(modelId: String, baseURL: String)
    case custom(provider: any ModelProvider)
    
    public static let `default`: LanguageModel = .anthropic(.opus4)
    public static let claude: LanguageModel = .anthropic(.opus4)
    public static let gpt4o: LanguageModel = .openai(.gpt4o)
    public static let llama: LanguageModel = .ollama(.llama33)
}

Each model includes capabilities metadata:

// Automatic capability detection
let model = LanguageModel.openai(.gpt4o)
print(model.supportsVision)      // true
print(model.supportsTools)       // true  
print(model.supportsAudioInput)  // true
print(model.contextLength)       // 128,000

2. Generation Functions (`Generation.swift`)

Core async functions following Vercel AI SDK patterns:

// Simple convenience functions
public func generate(_ prompt: String, using model: LanguageModel = .default) async throws -> String
public func stream(_ prompt: String, using model: LanguageModel = .default) async throws -> AsyncThrowingStream<TextStreamDelta, Error>
public func analyze(image: ImageInput, prompt: String, using model: Model? = nil) async throws -> String

// Advanced functions with full control
public func generateText(
    model: LanguageModel,
    messages: [ModelMessage],
    tools: [AgentTool]? = nil,
    settings: GenerationSettings = .default,
    maxSteps: Int = 1
) async throws -> GenerateTextResult

public func streamText(
    model: LanguageModel,
    messages: [ModelMessage],
    tools: [AgentTool]? = nil,
    settings: GenerationSettings = .default,
    maxSteps: Int = 1
) async throws -> StreamTextResult

public func generateObject<T: Codable & Sendable>(
    model: LanguageModel,
    messages: [ModelMessage],
    schema: T.Type,
    settings: GenerationSettings = .default
) async throws -> GenerateObjectResult<T>

3. Tool System (`Tool.swift`, `ToolKit.swift`)

Type-safe function calling with structured tool definitions:

// Protocol for tool collections
public protocol ToolKit: Sendable {
    associatedtype Context = Self
    var tools: [Tool<Context>] { get }
}

// Individual tool definition
public struct Tool<Context>: Sendable {
    public let name: String
    public let description: String
    public let execute: @Sendable (ToolInput, Context) async throws -> ToolOutput
}

// Helper functions for tool creation
public func createTool<Context>(
    name: String,
    description: String,
    _ handler: @escaping @Sendable (ToolInput, Context) async throws -> String
) -> Tool<Context>

4. Conversation Management (`Conversation.swift`)

Multi-turn conversations with SwiftUI support:

@available(macOS 13.0, iOS 16.0, watchOS 9.0, tvOS 16.0, *)
public final class Conversation: ObservableObject, @unchecked Sendable {
    @Published public private(set) var messages: [ConversationMessage] = []
    
    public func addUserMessage(_ content: String)
    public func addAssistantMessage(_ content: String)
    public func continue(using model: Model? = nil, tools: (any ToolKit)? = nil) async throws -> String
    public func clear()
}

5. Provider System (`ProviderFactory.swift`)

Extensible provider architecture supporting multiple AI services:

// Provider protocol for extensibility
public protocol ModelProvider: Sendable {
    var modelId: String { get }
    var baseURL: String? { get }
    var apiKey: String? { get }
    var capabilities: ModelCapabilities { get }
    
    func generateText(request: ProviderRequest) async throws -> ProviderResponse
    func streamText(request: ProviderRequest) async throws -> AsyncThrowingStream<TextStreamDelta, Error>
}

// Factory for creating providers
public struct ProviderFactory {
    public static func createProvider(for model: LanguageModel) throws -> any ModelProvider
}

Key Implementation Files

Component	File Location	Lines	Purpose
Model System	`Sources/Tachikoma/Model.swift`	1-875	Type-safe model enums, capabilities, provider selection
Generation	`Sources/Tachikoma/Generation.swift`	18-569	Core async generation functions, streaming, image analysis
Tool System	`Sources/Tachikoma/Tool.swift`	-	Tool protocol, execution, input/output handling
ToolKit Builders	`Sources/Tachikoma/ToolKit.swift`	1-285	ToolKit protocol, result builders, example implementations
Conversation	`Sources/Tachikoma/Conversation.swift`	-	Multi-turn conversation management, SwiftUI integration
Provider Factory	`Sources/Tachikoma/ProviderFactory.swift`	-	Provider instantiation, capability mapping
Usage Tracking	`Sources/Tachikoma/UsageTracking.swift`	-	Token usage, cost tracking, session management
Model Selection	`Sources/Tachikoma/ModelSelection.swift`	-	Command-line model parsing, string matching

Installation & Usage

Swift Package Manager

Add Tachikoma to your project using Swift Package Manager:

// Package.swift
dependencies: [
    .package(url: "https://github.com/steipete/Tachikoma.git", from: "1.0.0")
],
targets: [
    .target(
        name: "YourTarget",
        dependencies: [
            .product(name: "Tachikoma", package: "Tachikoma"),  // Single unified module
        ]
    )
]

Environment Setup

Set up API keys for the providers you want to use:

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Grok (xAI)
export X_AI_API_KEY="xai-..."
# or
export XAI_API_KEY="xai-..."

# Groq
export GROQ_API_KEY="gsk_..."

# Mistral
export MISTRAL_API_KEY="..."

# Google AI
export GOOGLE_API_KEY="AIza..."

# Ollama (runs locally)
export OLLAMA_API_KEY="optional-token"  # Usually not needed

# Custom base URLs (optional)
export OPENAI_BASE_URL="https://api.custom.com/v1"
export ANTHROPIC_BASE_URL="https://api.custom.com"
export OLLAMA_BASE_URL="http://localhost:11434"

Automatic Environment Variable Loading

Tachikoma automatically loads API keys from environment variables when the SDK initializes. The configuration system uses a hierarchical priority:

Explicitly configured keys (via configuration.setAPIKey())
Environment variables (loaded automatically on startup)
Credentials file (~/.tachikoma/credentials)

import Tachikoma

// Keys are loaded automatically from environment variables
// No manual configuration needed if environment variables are set

// Check what's available
let config = TachikomaConfiguration() // Loads from environment by default

// These will return environment keys if available
print("OpenAI available: \(config.hasAPIKey(for: .openai))")
print("Anthropic available: \(config.hasAPIKey(for: .anthropic))")
print("Grok available: \(config.hasAPIKey(for: .grok))")

// Check specifically for environment vs configured keys
print("OpenAI from env: \(config.hasEnvironmentAPIKey(for: .openai))")
print("OpenAI configured: \(config.hasConfiguredAPIKey(for: .openai))")

Provider Type-Safety

The SDK now uses a type-safe Provider enum instead of strings:

// ✅ Type-safe provider API
let config = TachikomaConfiguration()
config.setAPIKey("sk-...", for: .openai)
config.setAPIKey("sk-ant-...", for: .anthropic)
config.setAPIKey("xai-...", for: .grok)

// ✅ All standard providers supported
let providers: [Provider] = [
    .openai,     // OPENAI_API_KEY
    .anthropic,  // ANTHROPIC_API_KEY
    .grok,       // X_AI_API_KEY or XAI_API_KEY
    .groq,       // GROQ_API_KEY
    .mistral,    // MISTRAL_API_KEY
    .google,     // GOOGLE_API_KEY
    .ollama,     // OLLAMA_API_KEY (optional)
    .custom("my-provider")  // Custom provider ID
]

Alternative Environment Variables

Some providers support multiple environment variable names:

// Grok supports both naming conventions
export X_AI_API_KEY="xai-..."    # Primary
export XAI_API_KEY="xai-..."     # Alternative

// The SDK automatically checks both
let provider = Provider.grok
print(provider.environmentVariable)            // "X_AI_API_KEY"
print(provider.alternativeEnvironmentVariables) // ["XAI_API_KEY"]

Basic Usage Examples

Text Generation

import Tachikoma

// Simple generation with default model (Claude Opus 4)
let response = try await generate("Explain Swift async/await")

// With specific model
let gptResponse = try await generate(
    "Write a haiku about programming",
    using: .openai(.gpt4o)
)

// With system prompt and parameters
let response = try await generate(
    "Tell me a joke",
    using: .anthropic(.sonnet4),
    system: "You are a friendly comedian",
    maxTokens: 100,
    temperature: 0.9
)

Streaming Responses

let stream = try await stream("Write a long story", using: .claude)

for try await delta in stream {
    switch delta.type {
    case .textDelta:
        if let content = delta.content {
            print(content, terminator: "")
        }
    case .done:
        print("\n[Stream complete]")
        break
    default:
        break
    }
}

Image Analysis

// Analyze image from file path
let analysis = try await analyze(
    image: .filePath("/path/to/image.png"),
    prompt: "What do you see in this image?",
    using: .openai(.gpt4o)  // Vision-capable model required
)

// Analyze image from base64 data
let analysis = try await analyze(
    image: .base64(base64String),
    prompt: "Describe the contents of this screenshot"
)

Conversations

let conversation = Conversation()

// Add messages
conversation.addUserMessage("Hello! I'm learning Swift.")
let response1 = try await conversation.continue(using: .claude)
print(response1)

conversation.addUserMessage("Can you explain optionals?")
let response2 = try await conversation.continue()  // Uses same model
print(response2)

// In SwiftUI
struct ChatView: View {
    @StateObject private var conversation = Conversation()
    @State private var inputText = ""
    
    var body: some View {
        VStack {
            ScrollView {
                ForEach(conversation.messages) { message in
                    MessageView(message: message)
                }
            }
            
            HStack {
                TextField("Type a message...", text: $inputText)
                Button("Send") {
                    Task {
                        conversation.addUserMessage(inputText)
                        inputText = ""
                        let response = try await conversation.continue()
                        // Response automatically added to conversation
                    }
                }
            }
        }
    }
}

Tools and Function Calling

// Define a custom tool kit
struct CalculatorToolKit: ToolKit {
    var tools: [Tool<CalculatorToolKit>] {
        [
            createTool(
                name: "calculate",
                description: "Perform mathematical calculations"
            ) { input, context in
                let expression = try input.stringValue("expression")
                return try context.evaluate(expression: expression)
            },
            
            createTool(
                name: "convert_currency",
                description: "Convert between currencies"
            ) { input, context in
                let amount = try input.doubleValue("amount")
                let from = try input.stringValue("from_currency")
                let to = try input.stringValue("to_currency")
                return try await context.convertCurrency(amount: amount, from: from, to: to)
            }
        ]
    }
    
    func evaluate(expression: String) throws -> String {
        let expr = NSExpression(format: expression)
        let result = expr.expressionValue(with: nil, context: nil) as! NSNumber
        return "Result: \(result.doubleValue)"
    }
    
    func convertCurrency(amount: Double, from: String, to: String) async throws -> String {
        // Your currency conversion logic here
        return "\(amount) \(from) = \(amount * 1.1) \(to) (example rate)"
    }
}

// Use tools with AI
let calculator = CalculatorToolKit()
let result = try await generateText(
    model: .claude,
    messages: [.user("What is 15 * 23 + 100? Also convert 50 USD to EUR.")],
    tools: calculator.tools.map { $0.toAgentTool() }
)

print(result.text)  // AI will use the tools to calculate and convert

Build & Test

# Build the package
swift build

# Run tests
swift test

# Build in release mode
swift build -c release

Advanced Features

Usage Tracking

Tachikoma automatically tracks token usage and costs across all operations:

// Usage is tracked automatically during generation
let result = try await generateText(
    model: .openai(.gpt4o),
    messages: [.user("Long prompt here...")]
)

if let usage = result.usage {
    print("Input tokens: \(usage.inputTokens)")
    print("Output tokens: \(usage.outputTokens)")
    if let cost = usage.cost {
        print("Estimated cost: $\(cost)")
    }
}

// Access global usage statistics
let tracker = UsageTracker.shared
let todayUsage = tracker.getUsage(for: Date())
print("Today's total tokens: \(todayUsage.totalTokens)")

Multi-Step Tool Execution

Handle complex workflows with multiple tool calls:

let result = try await generateText(
    model: .claude,
    messages: [.user("Research Swift 6 features, then write documentation")],
    tools: researchTools.tools.map { $0.toAgentTool() },
    maxSteps: 5  // Allow multiple rounds of tool calling
)

// Access all execution steps
for (index, step) in result.steps.enumerated() {
    print("Step \(index + 1):")
    print("  Text: \(step.text)")
    print("  Tool calls: \(step.toolCalls.count)")
    print("  Tool results: \(step.toolResults.count)")
}

Structured Output Generation

Generate type-safe structured data:

struct PersonInfo: Codable {
    let name: String
    let age: Int
    let occupation: String
    let skills: [String]
}

let result = try await generateObject(
    model: .claude,
    messages: [.user("Create a person profile for a software engineer")],
    schema: PersonInfo.self
)

let person = result.object
print("Name: \(person.name)")
print("Skills: \(person.skills.joined(separator: ", "))")

Custom Providers

Extend Tachikoma with custom AI providers:

struct CustomProvider: ModelProvider {
    let modelId = "custom-model-v1"
    let baseURL: String? = "https://api.custom.ai"
    let apiKey: String? = ProcessInfo.processInfo.environment["CUSTOM_API_KEY"]
    
    let capabilities = ModelCapabilities(
        supportsVision: true,
        supportsTools: true,
        contextLength: 200_000
    )
    
    func generateText(request: ProviderRequest) async throws -> ProviderResponse {
        // Your custom implementation
    }
    
    func streamText(request: ProviderRequest) async throws -> AsyncThrowingStream<TextStreamDelta, Error> {
        // Your custom streaming implementation
    }
}

// Use custom provider
let customModel = LanguageModel.custom(provider: CustomProvider())
let response = try await generate("Hello", using: customModel)

Examples

See the following files for complete working examples:

Tests/TachikomaTests/MinimalModernAPITests.swift - Model construction, tool creation, conversation management
Sources/Tachikoma/ToolKit.swift - WeatherToolKit and MathToolKit implementations (lines 125-224)
Examples/TachikomaExamples.swift - Advanced usage patterns and real-world scenarios
Examples/DemoScript.swift - Interactive demo script

Documentation

Architecture Guide - Deep dive into system design and components
Modern API Design - Implementation plan and design decisions
API Reference - Complete API documentation in code
Migration Guide - Moving from other AI SDKs
Provider Guide - Setting up different AI providers
Tool Development - Creating custom tools and toolkits

Requirements

Swift 6.0+ with strict concurrency enabled
Platform Support:
- macOS 13.0+ (Ventura)
- iOS 16.0+
- watchOS 9.0+
- tvOS 16.0+
- Linux (Ubuntu 20.04+)
- Windows 10+
Concurrency: Full @Sendable compliance throughout
Dependencies:
- swift-log for logging

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/steipete/Tachikoma.git
cd Tachikoma

# Build and test
swift build
swift test

# Generate documentation
swift package generate-documentation

License

Tachikoma is available under the MIT License. See LICENSE for details.

Built with ❤️ for the Swift AI community

Intelligent • Adaptable • Reliable

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
Examples		Examples
Sources/Tachikoma		Sources/Tachikoma
Tests		Tests
assets		assets
docs		docs
.gitignore		.gitignore
.swiftformat		.swiftformat
.swiftlint.yml		.swiftlint.yml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
demo-realtime.swift		demo-realtime.swift
realtime-test.swift		realtime-test.swift

License

mkll/Tachikoma-ai

Folders and files

Latest commit

History

Repository files navigation

Tachikoma - Modern Swift AI SDK

Quick Start

Basic Setup

Conversation Management

OpenAI Realtime API (Voice Conversations) 🎙️

Quick Start

Advanced Features

Voice-Enabled Function Calling

Audio Pipeline & Processing

SwiftUI Integration

Event System

Complete Implementation Status ✅

Tool Integration

Core Features

Type-Safe Model Selection

Core Generation Functions

OpenAI Harmony-Inspired Features ✨

Realtime API Features

Conversation Management

Tool System with ToolKit Protocol

Architecture

System Overview

Package Structure

Core Components

1. Type-Safe Model System (Model.swift)

2. Generation Functions (Generation.swift)

3. Tool System (Tool.swift, ToolKit.swift)

4. Conversation Management (Conversation.swift)

5. Provider System (ProviderFactory.swift)

Key Implementation Files

Installation & Usage

Swift Package Manager

Environment Setup

Automatic Environment Variable Loading

Provider Type-Safety

Alternative Environment Variables

Basic Usage Examples

Text Generation

Streaming Responses

Image Analysis

Conversations

Tools and Function Calling

Build & Test

Advanced Features

Usage Tracking

Multi-Step Tool Execution

Structured Output Generation

Custom Providers

Examples

Documentation

Requirements

Contributing

Development Setup

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Type-Safe Model System (`Model.swift`)

2. Generation Functions (`Generation.swift`)

3. Tool System (`Tool.swift`, `ToolKit.swift`)

4. Conversation Management (`Conversation.swift`)

5. Provider System (`ProviderFactory.swift`)

Packages