GPUScheduler

Project Vision

GPUScheduler is a dynamic, AI-driven platform for GPU resource management. In a crowded market of reactive schedulers, GPUScheduler's unique value lies in its proactive, predictive allocation model.

Instead of simply managing a queue of existing requests, our goal is to anticipate the need for GPU resources and have them ready just in time. This is achieved through a dual-mode system:

A Robust REST API: A classical, highly-scalable distributed system that serves as the control plane for all resource operations (allocating, monitoring, de-provisioning).
An AI-Powered Client Agent: A lightweight, open-source, and user-owned agent that runs entirely within the customer's environment. It uses their own LLM API keys to locally monitor workloads and make intelligent, proactive allocation requests to our API. This architecture ensures we never access customer's private data, code, or logs, building a fundamental layer of trust.

This approach shifts the paradigm from resource management to resource anticipation, aiming to create a seamless, "zero-wait" experience for developers and researchers while optimizing for cost and efficiency.

Key Differentiators

Proactive Scheduling: Moves beyond reactive queuing to predict and pre-warm resources.
Trust-by-Design Agentic Workflow: Utilizes a user-owned, open-source agent to understand local context without compromising data privacy.
Simplicity: Abstracts away the complexity of cloud infrastructure behind a clean API and an intelligent agent.

Operating Modes

The GPUScheduler can be used in two primary ways:

1. Manual Allocation (Reactive)

Users can integrate directly with the scheduler's REST API. This allows them to programmatically request, list, and terminate GPU resources on-demand as part of their existing scripts or infrastructure-as-code setups.

2. Proactive Allocation (AI-Powered Agent)

For a more advanced, hands-off approach, users can deploy the client-side "smart agent." The user configures this agent to monitor their local systems (e.g., system logs, workload state, job submission history) and provides it with an LLM API key. The agent uses its AI capabilities to understand when a GPU-intensive task is about to begin and proactively calls the GPUScheduler API to have a resource ready just in time.

System Architecture

The backend is designed as a distributed, cloud-native application intended for high-availability and scalability. Key components include a stateless API service (FastAPI), a PostgreSQL database, a Celery/RabbitMQ task queue, and a Redis cache.

For a full technical breakdown, please see docs/DESIGN.md and docs/AGENT_DESIGN.md.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPUScheduler

Project Vision

Key Differentiators

Operating Modes

1. Manual Allocation (Reactive)

2. Proactive Allocation (AI-Powered Agent)

System Architecture

About

Uh oh!

Releases

Packages

Languages

License

weifengcao/GPUScheduler

Folders and files

Latest commit

History

Repository files navigation

GPUScheduler

Project Vision

Key Differentiators

Operating Modes

1. Manual Allocation (Reactive)

2. Proactive Allocation (AI-Powered Agent)

System Architecture

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages