Start with the Quick Start guide — you'll have a running agent in 5 minutes.
EvoClaw Developer Guide
EvoClaw is a self-evolving AI agent wrapper that turns live OpenClaw conversations into continuous training data. It wraps your model behind an OpenAI-compatible API proxy, intercepts every turn, scores it, injects skills, trains via cloud LoRA on Tinker, and hot-swaps weights — all while your users are typing.
Installation
EvoClaw requires Python 3.9+ and a valid Tinker API key for cloud training. Install all dependencies with pip:
pip install fastapi uvicorn httpx openai transformers tinker tinker-cookbookOptional: Verify installation
import evoclaw print(evoclaw.__version__) # 0.1.0
Quick Start
The fastest way to get EvoClaw running is with the included example scripts. Three commands and your agent is live and evolving.
Step 1 — Configure OpenClaw gateway
Run the setup script for your chosen model. This configures OpenClaw to route through the EvoClaw proxy on port 30000.
# Recommended: Kimi-2.5 (~200B MoE) bash openclaw_model_kimi.sh # Lightweight alternative: Qwen3-4B bash openclaw_model_qwen.sh
Step 2 — Set your Tinker API key
export TINKER_API_KEY="your_tinker_api_key_here"
Get your Tinker API key at thinkingmachines.ai/tinker. The key is free for development use.
Step 3 — Start EvoClaw
# Basic RL training mode python examples/run_conversation_rl.py # With skills + evolution enabled python examples/run_with_skills.py
You should see output like:
🦎 EvoClaw v0.1.0 starting... → Model: moonshotai/Kimi-2.5 → Proxy port: 30000 → Tinker URL: http://localhost:8080 → Skills: enabled (18 loaded) → Evolution: enabled → Ready. Start chatting — your agent will begin evolving!
Prerequisites
- Python 3.9+ — EvoClaw uses modern async/await and type hints throughout
- OpenClaw — The agent framework EvoClaw wraps. See openclaw.ai
- Tinker API key — For cloud LoRA training. Free tier available at thinkingmachines.ai/tinker
- OpenAI-compatible judge endpoint — For PRM scoring (e.g. Azure OpenAI, any OpenAI-compatible API)
- Network access — That's it. No GPU, no local model server, no cluster.
System Architecture Overview
EvoClaw consists of five main components that work together asynchronously. They are fully decoupled — the proxy serves users while training runs in the background without blocking.
| MODULE | FILE | ROLE |
|---|---|---|
| EvoClawProxy | proxy.py | OpenAI-compatible FastAPI server that intercepts all conversations |
| ConversationBuffer | buffer.py | Thread-safe async queue that accumulates turns until batch is full |
| RewardModel | reward_model.py | Calls a judge LLM to score each turn on a 0.0–1.0 scale |
| SkillManager | skill_manager.py | Loads, retrieves, injects, and evolves skills |
| EvoClawTrainer | trainer.py | Submits training jobs to Tinker and hot-swaps weights on completion |
EvoClawProxy
The proxy is a FastAPI application that implements the OpenAI chat completions API. It sits between OpenClaw and your model, transparently intercepting every request.
from evoclaw import EvoClawConfig, EvoClawProxy from evoclaw.buffer import ConversationBuffer import asyncio config = EvoClawConfig(proxy_port=30000) buffer = ConversationBuffer(config) proxy = EvoClawProxy(config=config, buffer=buffer) # Start the proxy server asyncio.run(proxy.serve())
The proxy supports both streaming and non-streaming completions. Point your OpenClaw config to http://localhost:30000/v1.
ConversationBuffer
A thread-safe async FIFO queue. Every intercepted turn is added to the buffer. When the buffer reaches batch_size, it triggers a training cycle automatically.
from evoclaw.buffer import ConversationBuffer buffer = ConversationBuffer(config) # Add a turn manually (proxy does this automatically) await buffer.add_turn(messages=messages, response=response) # Check current size print(buffer.size) # e.g. 14 / 32 # Register a callback when buffer is ready buffer.on_batch_ready(callback=trainer.train_step)
RewardModel (PRM)
The Process Reward Model scores every agent turn on a scale of 0.0 to 1.0. It calls a configurable judge endpoint — any OpenAI-compatible API works.
| SCORE RANGE | MEANING | ACTION |
|---|---|---|
| 0.7 – 1.0 | High quality | Strong positive gradient update |
| 0.3 – 0.7 | Acceptable | Moderate gradient update |
| 0.0 – 0.3 | Poor / failure | Weak update + triggers Skill Evolution |
from evoclaw.reward_model import RewardModel prm = RewardModel(config) # Score a single turn score = await prm.score( messages=conversation_history, response=agent_response ) print(score) # 0.82
EvoClawTrainer
The trainer listens to the buffer and submits LoRA training jobs to Tinker cloud when a batch is ready. After training, it calls the hot-swap API to update the live sampling server.
from evoclaw import EvoClawTrainer trainer = EvoClawTrainer(config=config, buffer=buffer) # Run the training loop (runs indefinitely) await trainer.run() # Or run a single step manually await trainer.train_step(batch=buffer.flush())
The Skill System
Skills are short Markdown instructions (typically 2–5 sentences) that guide agent behavior. They are stored in a JSON skill bank and retrieved at inference time based on relevance to the current conversation.
The default skill bank (memory_data/conversation/conversation_skills.json) ships with 18 skills across 5 categories:
| CATEGORY | SKILLS | EXAMPLE |
|---|---|---|
| coding | 6 | "Write clean, documented, tested code. Prefer readability over cleverness." |
| security | 4 | "Validate all inputs. Never construct SQL queries with string interpolation." |
| agentic | 3 | "Plan before acting. List steps, identify dependencies, verify preconditions." |
| writing | 3 | "Be direct and concise. Lead with the answer, then provide supporting detail." |
| research | 2 | "Cite sources. Distinguish between facts and analysis clearly." |
Skill Injection
At every turn, EvoClaw retrieves the top-K most relevant skills from the bank and injects them into the system prompt as a ### Agent Skills block. This happens before the model responds.
# Enable skill injection in config config = EvoClawConfig(use_skills=True) # The system prompt will be augmented like this: """ You are a helpful AI assistant. ### Agent Skills - Write clean, documented, tested code. - Validate all inputs before processing. - Prefer iterative solutions over recursive ones. """
The injection is fully transparent — users never see the skill block, but the model does. The result is immediate behavior improvement without any retraining.
Skill Evolution
When a turn receives a reward score below 0.3, EvoClaw automatically triggers skill evolution:
- The full conversation trajectory is sent to the evolution LLM (configured via
azure_openai_deployment) - The LLM analyzes what the agent did wrong and what it should have done differently
- A new, targeted skill is generated and appended to the skill bank as a JSON entry
- The new skill is immediately available for injection in future turns
# Enable evolution in config config = EvoClawConfig( use_skills=True, enable_skill_evolution=True, azure_openai_deployment="gpt-5.2", azure_openai_endpoint="https://YOUR-RESOURCE.openai.azure.com/", )
azure_openai_deployment model to generate skills. This incurs API costs. You can set a min_reward_for_evolution threshold to control when evolution triggers.Writing Custom Skills
You can add your own skills directly to the JSON bank. Each skill has a simple structure:
{
"id": "skill_019",
"category": "coding",
"title": "Always write type hints in Python",
"content": "When writing Python functions, always include type hints for all parameters and the return value. Use Optional[T] for values that may be None. Example: def process(data: list[str]) -> dict[str, int]:",
"tags": ["python", "typing", "best-practices"],
"enabled": true
}EvoClawConfig Reference
All settings are passed as a single EvoClawConfig dataclass instance. Both EvoClawProxy and EvoClawTrainer accept the same config object.
| FIELD | TYPE | DEFAULT | DESCRIPTION |
|---|---|---|---|
| model_name | str | "moonshotai/Kimi-2.5" | Base model. Kimi-2.5 recommended for best results. |
| lora_rank | int | 32 | LoRA rank for cloud fine-tuning. Higher = more capacity, slower training. |
| batch_size | int | 32 | Number of turns to accumulate before triggering a training step. |
| max_steps | int | 1000 | Total training steps before the training loop stops. |
| loss_fn | str | "importance_sampling" | Loss function: "importance_sampling", "ppo", or "cispo". |
| use_prm | bool | True | Enable Process Reward Model scoring for each turn. |
| prm_url | str | "https://api.openai.com/v1" | Base URL for the judge LLM endpoint (OpenAI-compatible). |
| prm_model | str | "gpt-5.2" | Model used for reward scoring / judging. |
| prm_api_key | str | None | API key for the PRM judge endpoint. Falls back to OPENAI_API_KEY env var. |
| use_skills | bool | False | Enable skill injection into system prompt at every turn. |
| skill_bank_path | str | "memory_data/..." | Path to the JSON skill bank file. |
| top_k_skills | int | 3 | Number of skills to inject per turn. |
| enable_skill_evolution | bool | False | Auto-generate new skills when reward score falls below threshold. |
| evolution_threshold | float | 0.3 | Reward score below which skill evolution is triggered. |
| proxy_port | int | 30000 | Port for the EvoClaw proxy server to listen on. |
| tinker_sampling_url | str | "http://localhost:8080" | Tinker sampling endpoint for model serving. |
| tinker_api_key | str | None | Tinker API key. Falls back to TINKER_API_KEY env var. |
| openclaw_env_data_dir | str | None | Optional path to JSONL tasks for programmatic (non-chat) rollout. |
| azure_openai_deployment | str | None | Azure OpenAI deployment name for skill evolution LLM. |
| azure_openai_endpoint | str | None | Azure OpenAI resource endpoint URL. |
| log_level | str | "INFO" | "DEBUG", "INFO", "WARNING", or "ERROR". |
Learning Modes
Mode 1: Reinforcement Learning (GRPO)
Uses Group Relative Policy Optimization to update the model policy based on scored conversation turns. Best when you have clear task completion signals from the environment.
config = EvoClawConfig( loss_fn="importance_sampling", # or "ppo", "cispo" use_prm=True, )
Mode 2: On-Policy Distillation (OPD)
Leverages richer natural-language supervision from a teacher model. Best with a strong teacher (e.g. GPT-5) providing detailed feedback. Faster convergence and denser signal than pure RL.
config = EvoClawConfig( loss_fn="importance_sampling", use_prm=True, prm_model="gpt-5.2", # Strong teacher prm_url="https://api.openai.com/v1", )
Supported Models
| MODEL | SIZE | USE CASE | SCRIPT |
|---|---|---|---|
| moonshotai/Kimi-2.5 | ~200B MoE | Recommended. Best quality, long context, strong reasoning. | openclaw_model_kimi.sh |
| Qwen/Qwen3-4B | 4B | Lightweight. Fast iteration, lower API costs, constrained environments. | openclaw_model_qwen.sh |
| Any OpenAI-compatible | — | Set model_name to any Tinker-supported model identifier. | — |
Discord Bot Integration NEW
Deploy your EvoClaw self-evolving agent directly to Discord. Every message from your community automatically becomes training data — the agent improves in real time.
1. Create a Discord Bot
- Go to discord.com/developers → New Application → Bot
- Copy the bot token
- Enable Message Content Intent under Bot → Privileged Gateway Intents
- Invite bot to server with
Read Messages+Send Messagespermissions
2. Install dependencies
pip install discord.py
3. Run the bot
# One command:
python -m evoclaw.bot discord \
--token YOUR_DISCORD_TOKEN \
--proxy http://localhost:8000 \
--channel-id 123456789 # optional: restrict to specific channels
# Or via Python:
from evoclaw.bot import EvoBotDiscord
bot = EvoBotDiscord(
discord_token="YOUR_DISCORD_TOKEN",
evoclaw_url="http://localhost:8000",
channel_ids=[123456789], # empty = all channels
system_prompt="You are a crypto trading assistant powered by EvoClaw.",
)
bot.run()
The bot responds to @mention or messages starting with !ask. All conversations are automatically scored and fed into the EvoClaw training loop.
Telegram Bot Integration NEW
Same self-evolving agent, deployed to Telegram. Ideal for crypto communities, trading groups, and DeFi support bots.
1. Create a Telegram Bot
- Open Telegram → search @BotFather → send
/newbot - Follow the prompts → copy the token
2. Install dependencies
pip install python-telegram-bot
3. Run the bot
# One command:
python -m evoclaw.bot telegram \
--token YOUR_TELEGRAM_TOKEN \
--proxy http://localhost:8000
# Or via Python:
from evoclaw.bot import EvoBotTelegram
bot = EvoBotTelegram(
telegram_token="YOUR_TELEGRAM_TOKEN",
evoclaw_url="http://localhost:8000",
system_prompt="You are a DeFi research assistant powered by EvoClaw.",
allowed_chats=[], # empty = all chats
)
bot.run()
Skill Auto-Tag NEW
Automatically categorize and tag every skill in your skill bank using an LLM. No manual labeling required. Skills become searchable and composable across agent instances.
Domains
Each skill is tagged with one of: crypto, coding, research, agentic, security, communication, general
Usage
from evoclaw.skill_autotag import SkillAutoTagger
# Tag all skills in the skill bank
tagger = SkillAutoTagger(
api_key="YOUR_OPENAI_KEY", # or set OPENAI_API_KEY env var
model="gpt-4o-mini", # cheap + fast
)
tagger.tag_all("memory_data/conversation/conversation_skills.json")
# Search tagged skills by domain
crypto_skills = tagger.search("memory_data/conversation/conversation_skills.json", domain="crypto")
for skill in crypto_skills:
print(skill["text"], "→", skill["tags"])
CLI
# Tag all skills
python -m evoclaw.skill_autotag memory_data/conversation/conversation_skills.json
# Re-tag existing (overwrite)
python -m evoclaw.skill_autotag memory_data/conversation/conversation_skills.json --overwrite
# Tag + search by domain
python -m evoclaw.skill_autotag memory_data/conversation/conversation_skills.json --search-domain crypto
Output format
Each skill in the JSON becomes:
{
"text": "When analyzing DeFi protocols, always check TVL, audit history, and token distribution.",
"domain": "crypto",
"tags": ["defi", "risk-analysis", "protocol-audit"],
"complexity": "intermediate",
"use_case": "Use when evaluating DeFi investment opportunities or reviewing smart contract safety."
}
Deployment Guide
Production setup with systemd
# /etc/systemd/system/evoclaw.service [Unit] Description=EvoClaw Self-Evolving Agent After=network.target [Service] User=ubuntu WorkingDirectory=/opt/evoclaw Environment="TINKER_API_KEY=tk_xxxxxxxxxxxx" ExecStart=/usr/bin/python3 examples/run_with_skills.py Restart=always RestartSec=5 [Install] WantedBy=multi-user.target
Environment variables
| VAR | REQUIRED | DESCRIPTION |
|---|---|---|
| TINKER_API_KEY | Yes | Tinker cloud training API key |
| OPENAI_API_KEY | If using OpenAI for PRM | Judge LLM API key |
| AZURE_OPENAI_KEY | If using Azure for evolution | Azure OpenAI API key |
| EVOCLAW_LOG_LEVEL | No | Defaults to INFO |
Troubleshooting
Proxy not starting
Make sure port 30000 is available. Change it with proxy_port in your config. Check that FastAPI and uvicorn are installed correctly.
Training not triggering
The trainer only fires when the buffer reaches batch_size (default 32). For testing, reduce it: EvoClawConfig(batch_size=4). Also verify your Tinker API key is set correctly.
PRM scoring failing
Verify prm_url is reachable and your API key (prm_api_key or OPENAI_API_KEY) is valid. You can test with use_prm=False for initial setup.
Skill evolution not triggering
Evolution only fires when enable_skill_evolution=True AND reward is below evolution_threshold (default 0.3). Check that azure_openai_deployment is set.
Frequently Asked Questions
Does EvoClaw work without a GPU?
Yes. The entire training pipeline runs on Tinker cloud. Your machine only needs network access. This is one of EvoClaw's core design principles.
How is EvoClaw different from MetaClaw?
EvoClaw builds on MetaClaw's concept but adds a proper Python package structure, full documentation, an interactive website, a streaming proxy, more default skills, better error handling, and a pip-installable setup. See the full comparison.
Can I use EvoClaw with models other than Kimi-2.5?
Yes. Any model supported by Tinker cloud can be used. Set model_name to the Tinker model identifier. Qwen3-4B is a good lightweight option.
Is my conversation data sent anywhere?
Training batches are sent to Tinker cloud for LoRA fine-tuning. The PRM judge endpoint receives conversation turns for scoring. No data is sent to EvoClaw's servers — EvoClaw is entirely open-source and runs on your infrastructure.
Can I disable training and just use skill injection?
Yes. Set use_skills=True and simply don't start the EvoClawTrainer. The proxy and skill injection will work independently.
Changelog
- Initial open-source release of EvoClaw
- EvoClawProxy with streaming + non-streaming support
- ConversationBuffer with async thread-safe queue
- RewardModel with configurable judge endpoint
- SkillManager with injection and evolution
- EvoClawTrainer with Tinker cloud LoRA integration
- 18 default skills across 5 categories
- Kimi-2.5 and Qwen3-4B setup scripts
- MIT license
- Web dashboard for real-time training metrics
- Skill marketplace integration
- Multi-agent support
- Export trained adapters as standalone LoRA files
Ready to start? Your agent evolves with every conversation.