A comprehensive, open collection of Agent Skills focused on context engineering principles for building production-grade AI agent systems. These skills teach the art and science of curating context to maximize agent effectiveness across any agent platform.
Context engineering is the discipline of managing the language model's context window. Unlike prompt engineering, which focuses on crafting effective instructions, context engineering addresses the holistic curation of all information that enters the model's limited attention budget: system prompts, tool definitions, retrieved documents, message history, and tool outputs.
The fundamental challenge is that context windows are constrained not by raw token capacity but by attention mechanics. As context length increases, models exhibit predictable degradation patterns: the "lost-in-the-middle" phenomenon, U-shaped attention curves, and attention scarcity. Effective context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes.
These skills establish the foundational understanding required for all subsequent context engineering work.
| Skill | Description |
|---|---|
| context-fundamentals | Understand what context is, why it matters, and the anatomy of context in agent systems |
| context-degradation | Recognize patterns of context failure: lost-in-middle, poisoning, distraction, and clash |
| context-compression | Design and evaluate compression strategies for long-running sessions |
These skills cover the patterns and structures for building effective agent systems.
| Skill | Description |
|---|---|
| multi-agent-patterns | Master orchestrator, peer-to-peer, and hierarchical multi-agent architectures |
| memory-systems | Design short-term, long-term, and graph-based memory architectures |
| tool-design | Build tools that agents can use effectively |
These skills address the ongoing operation and optimization of agent systems.
| Skill | Description |
|---|---|
| context-optimization | Apply compaction, masking, and caching strategies |
| evaluation | Build evaluation frameworks for agent systems |
| advanced-evaluation | NEW Master LLM-as-a-Judge techniques: direct scoring, pairwise comparison, rubric generation, and bias mitigation |
Each skill is structured for efficient context use. At startup, agents load only skill names and descriptions. Full content loads only when a skill is activated for relevant tasks.
These skills focus on transferable principles rather than vendor-specific implementations. The patterns work across Claude Code, Cursor, and any agent platform that supports skills or allows custom instructions.
Scripts and examples demonstrate concepts using Python pseudocode that works across environments without requiring specific dependency installations.
Install skills by referencing this repository or by copying skill folders into your configured skills directory. When working on context engineering tasks, activate relevant skills to load their instructions.
Copy skill content into .rules or create project-specific SKills folders. The skills provide the context and guidelines that agent needs for effective context engineering and agent design.
Extract the principles and patterns from any skill and implement them in your agent framework. The skills are deliberately platform-agnostic.
The examples folder contains complete system designs that demonstrate how multiple skills work together in practice.
| Example | Description | Skills Applied |
|---|---|---|
| x-to-book-system | Multi-agent system that monitors X accounts and generates daily synthesized books | multi-agent-patterns, memory-systems, context-optimization, tool-design, evaluation |
| llm-as-judge-skills | NEW Production-ready LLM evaluation tools with TypeScript implementation, 19 passing tests | advanced-evaluation, tool-design, context-fundamentals, evaluation |
Each example includes:
- Complete PRD with architecture decisions
- Skills mapping showing which concepts informed each decision
- Implementation guidance
The llm-as-judge-skills example is a complete TypeScript implementation demonstrating:
- Direct Scoring: Evaluate responses against weighted criteria with rubric support
- Pairwise Comparison: Compare responses with position bias mitigation
- Rubric Generation: Create domain-specific evaluation standards
- EvaluatorAgent: High-level agent combining all evaluation capabilities
Based on research from Eugene Yan's LLM-Evaluators and Vercel AI SDK 6.
# Quick start
cd examples/llm-as-judge-skills
npm install
cp env.example .env # Add OPENAI_API_KEY
npm test # Run 19 passing testsEach skill follows the Agent Skills specification:
skill-name/
├── SKILL.md # Required: instructions + metadata
├── scripts/ # Optional: executable code demonstrating concepts
└── references/ # Optional: additional documentation and resources
See the template folder for the canonical skill structure.
This repository follows the Agent Skills open development model. Contributions are welcome from the broader ecosystem. When contributing:
- Follow the skill template structure
- Provide clear, actionable instructions
- Include working examples where appropriate
- Document trade-offs and potential issues
- Keep SKILL.md under 500 lines for optimal performance
Feel free to reach out to Muratcan Koylan for collaboration and any requests.
MIT License - see LICENSE file for details.
The principles in these skills are derived from research and production experience at leading AI labs and framework developers. Each skill includes references to the underlying research and case studies that inform its recommendations.