Start with the Quick Start guide — you'll have a running agent in 5 minutes.
EvoClaw Developer Guide
EvoClaw is a self-evolving AI agent wrapper that turns live OpenClaw conversations into continuous training data. It wraps your model behind an OpenAI-compatible API proxy, intercepts every turn, scores it, injects skills, trains via cloud LoRA on Tinker, and hot-swaps weights — all while your users are typing.
Installation
EvoClaw requires Python 3.9+ and a valid Tinker API key for cloud training. Install all dependencies with pip:
pip install evoclawOptional: Verify installation
import evoclaw print(evoclaw.__version__) # 0.2.1
Quick Start
The fastest way to get EvoClaw running is with the included example scripts. Three commands and your agent is live and evolving.
Step 1 — Configure OpenClaw gateway
Run the setup script for your chosen model. This configures OpenClaw to route through the EvoClaw proxy on port 30000.
# Recommended: Kimi-2.5 (~200B MoE) evoclaw init
Step 2 — Set your Tinker API key
export TINKER_API_KEY="your_tinker_api_key_here"
Get your Tinker API key at thinkingmachines.ai/tinker. The key is free for development use.
Step 3 — Start EvoClaw
# Basic RL training mode evoclaw start
You should see output like:
EvoClaw v0.2.1 starting... → Model: moonshotai/Kimi-2.5 → Proxy port: 30000 → Tinker URL: http://localhost:8080 → Skills: enabled (18 loaded) → Evolution: enabled → Ready. Start chatting — your agent will begin evolving!
Prerequisites
- Python 3.9+ — EvoClaw uses modern async/await and type hints throughout
- OpenClaw — The agent framework EvoClaw wraps. See openclaw.ai
- Tinker API key — For cloud LoRA training. Free tier available at thinkingmachines.ai/tinker
- OpenAI-compatible judge endpoint — For PRM scoring (e.g. Azure OpenAI, any OpenAI-compatible API)
- Network access — That's it. No GPU, no local model server, no cluster.
System Architecture Overview
EvoClaw consists of five main components that work together asynchronously. They are fully decoupled — the proxy serves users while training runs in the background without blocking.
| MODULE | FILE | ROLE |
|---|---|---|
| EvoClawProxy | proxy.py | OpenAI-compatible FastAPI server that intercepts all conversations |
| ConversationBuffer | buffer.py | Thread-safe async queue that accumulates turns until batch is full |
| RewardModel | reward_model.py | Calls a judge LLM to score each turn on a 0.0–1.0 scale |
| SkillManager | skill_manager.py | Loads, retrieves, injects, and evolves skills |
| EvoClawTrainer | trainer.py | Submits training jobs to Tinker and hot-swaps weights on completion |
EvoClawProxy
The proxy is a FastAPI application that implements the OpenAI chat completions API. It sits between OpenClaw and your model, transparently intercepting every request.
from evoclaw import EvoClawConfig, EvoClawProxy from evoclaw.buffer import ConversationBuffer import asyncio config = EvoClawConfig(proxy_port=30000) buffer = ConversationBuffer(config) proxy = EvoClawProxy(config=config, buffer=buffer) # Start the proxy server asyncio.run(proxy.serve())
The proxy supports both streaming and non-streaming completions. Point your OpenClaw config to http://localhost:30000/v1.
ConversationBuffer
A thread-safe async FIFO queue. Every intercepted turn is added to the buffer. When the buffer reaches batch_size, it triggers a training cycle automatically.
from evoclaw.buffer import ConversationBuffer buffer = ConversationBuffer(config) # Add a turn manually (proxy does this automatically) await buffer.add_turn(messages=messages, response=response) # Check current size print(buffer.size) # e.g. 14 / 32 # Register a callback when buffer is ready buffer.on_batch_ready(callback=trainer.train_step)
RewardModel (PRM)
The Process Reward Model scores every agent turn on a scale of 0.0 to 1.0. It calls a configurable judge endpoint — any OpenAI-compatible API works.
| SCORE RANGE | MEANING | ACTION |
|---|---|---|
| 0.7 – 1.0 | High quality | Strong positive gradient update |
| 0.3 – 0.7 | Acceptable | Moderate gradient update |
| 0.0 – 0.3 | Poor / failure | Weak update + triggers Skill Evolution |
from evoclaw.reward_model import RewardModel prm = RewardModel(config) # Score a single turn score = await prm.score( messages=conversation_history, response=agent_response ) print(score) # 0.82
EvoClawTrainer
The trainer listens to the buffer and submits LoRA training jobs to Tinker cloud when a batch is ready. After training, it calls the hot-swap API to update the live sampling server.
from evoclaw import EvoClawTrainer trainer = EvoClawTrainer(config=config, buffer=buffer) # Run the training loop (runs indefinitely) await trainer.run() # Or run a single step manually await trainer.train_step(batch=buffer.flush())
The Skill System
Skills are short Markdown instructions (typically 2–5 sentences) that guide agent behavior. They are stored in a JSON skill bank and retrieved at inference time based on relevance to the current conversation.
The default skill bank (memory_data/conversation/conversation_skills.json) ships with 18 skills across 5 categories:
| CATEGORY | SKILLS | EXAMPLE |
|---|---|---|
| coding | 6 | "Write clean, documented, tested code. Prefer readability over cleverness." |
| security | 4 | "Validate all inputs. Never construct SQL queries with string interpolation." |
| agentic | 3 | "Plan before acting. List steps, identify dependencies, verify preconditions." |
| writing | 3 | "Be direct and concise. Lead with the answer, then provide supporting detail." |
| research | 2 | "Cite sources. Distinguish between facts and analysis clearly." |
Skill Injection
At every turn, EvoClaw retrieves the top-K most relevant skills from the bank and injects them into the system prompt as a ### Agent Skills block. This happens before the model responds.
# Enable skill injection in config config = EvoClawConfig(use_skills=True) # The system prompt will be augmented like this: """ You are a helpful AI assistant. ### Agent Skills - Write clean, documented, tested code. - Validate all inputs before processing. - Prefer iterative solutions over recursive ones. """
The injection is fully transparent — users never see the skill block, but the model does. The result is immediate behavior improvement without any retraining.
Skill Evolution
When a turn receives a reward score below 0.3, EvoClaw automatically triggers skill evolution:
- The full conversation trajectory is sent to the evolution LLM (configured via
azure_openai_deployment) - The LLM analyzes what the agent did wrong and what it should have done differently
- A new, targeted skill is generated and appended to the skill bank as a JSON entry
- The new skill is immediately available for injection in future turns
# Enable evolution in config config = EvoClawConfig( use_skills=True, enable_skill_evolution=True, azure_openai_deployment="gpt-5.2", azure_openai_endpoint="https://YOUR-RESOURCE.openai.azure.com/", )
azure_openai_deployment model to generate skills. This incurs API costs. You can set a min_reward_for_evolution threshold to control when evolution triggers.Writing Custom Skills
You can add your own skills directly to the JSON bank. Each skill has a simple structure:
{
"id": "skill_019",
"category": "coding",
"title": "Always write type hints in Python",
"content": "When writing Python functions, always include type hints for all parameters and the return value. Use Optional[T] for values that may be None. Example: def process(data: list[str]) -> dict[str, int]:",
"tags": ["python", "typing", "best-practices"],
"enabled": true
}EvoClawConfig Reference
All settings are passed as a single EvoClawConfig dataclass instance. Both EvoClawProxy and EvoClawTrainer accept the same config object.
| FIELD | TYPE | DEFAULT | DESCRIPTION |
|---|---|---|---|
| model_name | str | "moonshotai/Kimi-2.5" | Base model. Kimi-2.5 recommended for best results. |
| lora_rank | int | 32 | LoRA rank for cloud fine-tuning. Higher = more capacity, slower training. |
| batch_size | int | 32 | Number of turns to accumulate before triggering a training step. |
| max_steps | int | 1000 | Total training steps before the training loop stops. |
| loss_fn | str | "importance_sampling" | Loss function: "importance_sampling", "ppo", or "cispo". |
| use_prm | bool | True | Enable Process Reward Model scoring for each turn. |
| prm_url | str | "https://api.openai.com/v1" | Base URL for the judge LLM endpoint (OpenAI-compatible). |
| prm_model | str | "gpt-5.2" | Model used for reward scoring / judging. |
| prm_api_key | str | None | API key for the PRM judge endpoint. Falls back to OPENAI_API_KEY env var. |
| use_skills | bool | False | Enable skill injection into system prompt at every turn. |
| skill_bank_path | str | "memory_data/..." | Path to the JSON skill bank file. |
| top_k_skills | int | 3 | Number of skills to inject per turn. |
| enable_skill_evolution | bool | False | Auto-generate new skills when reward score falls below threshold. |
| evolution_threshold | float | 0.3 | Reward score below which skill evolution is triggered. |
| proxy_port | int | 30000 | Port for the EvoClaw proxy server to listen on. |
| tinker_sampling_url | str | "http://localhost:8080" | Tinker sampling endpoint for model serving. |
| tinker_api_key | str | None | Tinker API key. Falls back to TINKER_API_KEY env var. |
| openclaw_env_data_dir | str | None | Optional path to JSONL tasks for programmatic (non-chat) rollout. |
| azure_openai_deployment | str | None | Azure OpenAI deployment name for skill evolution LLM. |
| azure_openai_endpoint | str | None | Azure OpenAI resource endpoint URL. |
| log_level | str | "INFO" | "DEBUG", "INFO", "WARNING", or "ERROR". |
Learning Modes
Mode 1: Reinforcement Learning (GRPO)
Uses Group Relative Policy Optimization to update the model policy based on scored conversation turns. Best when you have clear task completion signals from the environment.
config = EvoClawConfig( loss_fn="importance_sampling", # or "ppo", "cispo" use_prm=True, )
Mode 2: On-Policy Distillation (OPD)
Leverages richer natural-language supervision from a teacher model. Best with a strong teacher (e.g. GPT-5) providing detailed feedback. Faster convergence and denser signal than pure RL.
config = EvoClawConfig( loss_fn="importance_sampling", use_prm=True, prm_model="gpt-5.2", # Strong teacher prm_url="https://api.openai.com/v1", )
Supported Models
| MODEL | SIZE | USE CASE | SCRIPT |
|---|---|---|---|
| moonshotai/Kimi-2.5 | ~200B MoE | Recommended. Best quality, long context, strong reasoning. | evoclaw init |
| Qwen/Qwen3-4B | 4B | Lightweight. Fast iteration, lower API costs, constrained environments. | evoclaw init |
| Any OpenAI-compatible | — | Set model_name to any Tinker-supported model identifier. | — |
Deployment Guide
Production setup with systemd
# /etc/systemd/system/evoclaw.service [Unit] Description=EvoClaw Self-Evolving Agent After=network.target [Service] User=ubuntu WorkingDirectory=/opt/evoclaw Environment="TINKER_API_KEY=your_key_here" ExecStart=/usr/bin/evoclaw start Restart=always RestartSec=5 [Install] WantedBy=multi-user.target
Environment variables
| VAR | REQUIRED | DESCRIPTION |
|---|---|---|
| TINKER_API_KEY | Yes | Tinker cloud training API key |
| OPENAI_API_KEY | If using OpenAI for PRM | Judge LLM API key |
| AZURE_OPENAI_KEY | If using Azure for evolution | Azure OpenAI API key |
| EVOCLAW_LOG_LEVEL | No | Defaults to INFO |
Troubleshooting
Proxy not starting
Make sure port 30000 is available. Change it with proxy_port in your config. Check that FastAPI and uvicorn are installed correctly.
Training not triggering
The trainer only fires when the buffer reaches batch_size (default 32). For testing, reduce it: EvoClawConfig(batch_size=4). Also verify your Tinker API key is set correctly.
PRM scoring failing
Verify prm_url is reachable and your API key (prm_api_key or OPENAI_API_KEY) is valid. You can test with use_prm=False for initial setup.
Skill evolution not triggering
Evolution only fires when enable_skill_evolution=True AND reward is below evolution_threshold (default 0.3). Check that azure_openai_deployment is set.
Frequently Asked Questions
Does EvoClaw work without a GPU?
Yes. The entire training pipeline runs on Tinker cloud. Your machine only needs network access. This is one of EvoClaw's core design principles.
How is EvoClaw different from MetaClaw?
EvoClaw builds on MetaClaw's concept but adds a proper Python package structure, full documentation, an interactive website, a streaming proxy, more default skills, better error handling, and a pip-installable setup. See the full comparison.
Can I use EvoClaw with models other than Kimi-2.5?
Yes. Any model supported by Tinker cloud can be used. Set model_name to the Tinker model identifier. Qwen3-4B is a good lightweight option.
Is my conversation data sent anywhere?
Training batches are sent to Tinker cloud for LoRA fine-tuning. The PRM judge endpoint receives conversation turns for scoring. No data is sent to EvoClaw's servers — EvoClaw is entirely open-source and runs on your infrastructure.
Can I disable training and just use skill injection?
Yes. Set use_skills=True and simply don't start the EvoClawTrainer. The proxy and skill injection will work independently.
Changelog
- Initial open-source release of EvoClaw
- EvoClawProxy with streaming + non-streaming support
- ConversationBuffer with async thread-safe queue
- RewardModel with configurable judge endpoint
- SkillManager with injection and evolution
- EvoClawTrainer with Tinker cloud LoRA integration
- 18 default skills across 5 categories
- Kimi-2.5 and Qwen3-4B setup scripts
- MIT license
- Web dashboard for real-time training metrics
- Skill marketplace integration
- Multi-agent support
- Export trained adapters as standalone LoRA files
Ready to start? Your agent evolves with every conversation.
LoRA Training
EvoClaw supports cloud-based LoRA fine-tuning via Tinker. No GPU required — training runs entirely on Tinker's infrastructure.
Supported Models
| Model | Size | Notes |
|---|---|---|
| moonshotai/Kimi-K2.5 | ~200B MoE | Best quality. Recommended. |
| Qwen/Qwen3-4B | 4B | Fast, low cost. |
| meta-llama/Llama-3.1-8B | 8B | Balanced speed and quality. |
Setup
Get a Tinker API key at thinkingmachines.ai/tinker, then run:
evoclaw initSelect your model and paste your Tinker API key when prompted. Config is saved to ~/.evoclaw/config.json.
How It Works
ConversationBuffer and scored by the PRM.train_every_n conversations, EvoClaw sends the dataset to Tinker.Start Training
evoclaw startConfig Reference
{
"tinker_api_key": "your_key_here",
"model": "moonshotai/Kimi-K2.5",
"lora_rank": 32,
"train_every_n": 10,
"auto_train": true
}
EvoClaw v0.2.1 starting...