xRx

Build apps with "any modality input (x), reasoning (R), any modality output (x)."

Introduction

xRx is a framework for building AI-powered reasoning systems that interact with users across multiple modalities, where "x" represents the flexible integration of text, voice, and other interaction forms.

We believe that the future of software interactions lies in multimodal experiences, and xRx is at the forefront of this movement. It enables developers to build sophisticated AI systems that seamlessly integrate various input and output modalities, providing users with a truly immersive experience.

Overview

xRx is a set of building blocks for developers looking to build next-generation AI-powered user experiences. Whether you're developing voice-based assistants, text-based chatbots, or multimodal applications, xRx provides the building blocks you need.

Key Features

Multimodal Input and Output: Integrate audio, text, and other modalities effortlessly.
Advanced Reasoning: Utilize comprehensive reasoning systems to enhance user interactions.
Modular Architecture: Easily extend and customize components to fit your specific needs.

System Architecture

The xRx system is composed of several key components, each playing a crucial role in delivering a seamless multimodal experience.

flowchart TD
    A[Client] <-->|audio/text| B[Orchestrator]
    B -->|Send audio| C[STT]
    C -->|Return text| B
    B <-->|text| G[Guardrail Proxy]
    G <-->|text| D[Agent]
    D[Agent] <-->|text / API requests| F[External Services]
    B -->|Send text| E[TTS]
    E -->|Return audio| B

style A fill:#FFCDD2,stroke:#B71C1C,stroke-width:2px,color:#000000
style B fill:#BBDEFB,stroke:#0D47A1,stroke-width:2px,color:#000000
style C fill:#C8E6C9,stroke:#1B5E20,stroke-width:2px,color:#000000
style D fill:#FFF9C4,stroke:#F57F17,stroke-width:2px,color:#000000
style E fill:#D1C4E9,stroke:#4A148C,stroke-width:2px,color:#000000
style F fill:#FFECB3,stroke:#FF6F00,stroke-width:2px,color:#000000
style G fill:#E1BEE7,stroke:#4A148C,stroke-width:2px,color:#000000

High-Level Architecture

Client: Front end app experience which renders the UI and handles websocket communication with the Orchestrator. Client Directory
Orchestrator: Manages the flow of data between various AI and traditional software components. Orchestrator Directory
STT (Speech-to-Text): Converts audio input to text. STT Directory
TTS (Text-to-Speech): Converts text responses back to audio. TTS Directory
Agent: A collection reasoning agents responsible for the "reasoning" system of xRx. Reasoning Directory
Guardrails Proxy: A safety layer for the reasoning system. Guardrails Proxy Directory

These components then communicate via the following sequence diagram

sequenceDiagram
    participant Client
    participant Orchestrator
    participant STT
    participant Agent
    participant TTS

    Client->> Orchestrator: Send audio on websockets port 8000
    Orchestrator->>STT: Send audio on websockets port 8001
    STT ->>Orchestrator: Return text
    STT ->>Orchestrator: Return text
    Orchestrator->>Agent: Send text on port 8003
    Agent->>Orchestrator: Return text
    Orchestrator->>TTS: Send text on port 8002
    TTS ->>Orchestrator: Return audio
    Orchestrator->>Client: Return audio, text, and application widgets

Reasoning Systems

To showcase the capabilities of xRx, we've created multiple reasoning systems:

Simple Tool Calling Agent

We've created a simple tool calling agent that demonstrates basic functionality. This agent has access to tools like weather and time retrievers, and stock price lookup. It shows how any Python-based reasoning agent can be deployed into the xRx system. The code for this reasoning agent can be found here.

Shopify Interaction Agent

We have built a sophisticated reasoning system that interacts with a Shopify store. The shopify-agent allows users to interact with a reasoning system built on top of Shopify, handling tasks like product inquiries, order placement, and customer service.

Wolfram Assistant Agent

The wolfram-assistant-agent leverages Wolfram Alpha's conversational API to provide answers to user queries, particularly useful for mathematical and scientific questions. This agent enhances the dialogue with refined language processing to deliver a smooth and engaging user experience.

Patient Information Agent

The patient-information-agent is designed to collect and manage patient information before a doctor's visit. It demonstrates how xRx can be applied in healthcare scenarios, gathering essential medical data in a conversational manner.

Template Agent

For developers looking to create their own reasoning agents, we provide a template-agent. This serves as a starting point for developing new reasoning agents within the xRx framework, offering a basic structure that can be easily customized and extended.

Each of these agents showcases different aspects of the xRx system's capabilities, from simple tool integration to complex domain-specific interactions. They demonstrate the flexibility and power of the xRx framework in building sophisticated AI-powered user experiences across various domains.

Getting Started

Prerequisites

To deploy xRx locally, you need the following components:

Docker
Python (version 3.10)
Node (version 18)
Pip

brew cask install docker
brew install [email protected]
brew install node@18

External Services Configuration

xRx requires three external services: LLM, Text-to-Speech, and Speech-to-Text. Configure these services by setting the environment variables in the .env file at the root of the repository.

LLMs

We recommend Groq for high token throughput. Sign up at Groq and obtain an API key.

LLM_API_KEY="<your Api Key>"
LLM_BASE_URL="https://api.groq.com/openai/v1"
LLM_MODEL_ID="llama3-70b-8192"

We recommend the models in the variables above for our repository, but they can be changed to any model that is supported by the LLM provider.

Text to Speech

We use Elevenlabs for text-to-speech. Sign up at Elevenlabs and obtain an API key.

ELEVENLABS_API_KEY=<your elevenlabs api key>
ELEVENLABS_VOICE_ID=<your elevenlabs voice id>

Speech to Text

We support multiple transcription services. For ease, use Groq's Whisper, given that you already have an API key.

STT_PROVIDER="groq"
GROQ_STT_API_KEY="<your groq api key>"

How To Run with a Simple Agent

Create a .env file with content at the root (./) See env.quickstart for an example of what this environment file should look like.
Build and run the system using Docker:

docker-compose up --build

Visit the xRx Demo client at http://localhost:3000

Enjoy exploring and interacting with the xRx system!

Contributing

We welcome contributions from the community. Whether you're adding new features, fixing bugs, or improving documentation, your efforts are valued.

For more information on contributing, see our Contribution Guide.

Documentation

See our documentation here

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
agent_framework		agent_framework
docs		docs
guardrails-proxy		guardrails-proxy
llm-observability		llm-observability
orchestrator		orchestrator
react-xrx-client		react-xrx-client
stt		stt
tts		tts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
contributing.md		contributing.md
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

xRx

Introduction

Overview

Key Features

High-Level Architecture

Simple Tool Calling Agent

Shopify Interaction Agent

Wolfram Assistant Agent

Patient Information Agent

Template Agent

Prerequisites

External Services Configuration

LLMs

Text to Speech

Speech to Text

How To Run with a Simple Agent

About

Uh oh!

Releases

Packages

Languages

License

sstrelnikov/xrx-core

Folders and files

Latest commit

History

Repository files navigation

xRx

Introduction

Overview

Key Features

High-Level Architecture

Simple Tool Calling Agent

Shopify Interaction Agent

Wolfram Assistant Agent

Patient Information Agent

Template Agent

Prerequisites

External Services Configuration

LLMs

Text to Speech

Speech to Text

How To Run with a Simple Agent

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages