DOCUMENTATION

EvoClaw Developer Guide

EvoClaw is a self-evolving AI agent wrapper that turns live OpenClaw conversations into continuous training data. It wraps your model behind an OpenAI-compatible API proxy, intercepts every turn, scores it, injects skills, trains via cloud LoRA on Tinker, and hot-swaps weights — all while your users are typing.

📦 Version 0.1.0 — EvoClaw is in active development. The API is stable but may have minor breaking changes before v1.0. See the changelog for updates.
🚀 New to EvoClaw?

Start with the Quick Start guide — you'll have a running agent in 5 minutes.

⚙️ Already set up?

Jump to the Config Reference or Skills Guide.

Installation

EvoClaw requires Python 3.9+ and a valid Tinker API key for cloud training. Install all dependencies with pip:

bash
pip install fastapi uvicorn httpx openai transformers tinker tinker-cookbook

Optional: Verify installation

python
import evoclaw
print(evoclaw.__version__)  # 0.1.0

Quick Start

The fastest way to get EvoClaw running is with the included example scripts. Three commands and your agent is live and evolving.

Step 1 — Configure OpenClaw gateway

Run the setup script for your chosen model. This configures OpenClaw to route through the EvoClaw proxy on port 30000.

bash
# Recommended: Kimi-2.5 (~200B MoE)
bash openclaw_model_kimi.sh

# Lightweight alternative: Qwen3-4B
bash openclaw_model_qwen.sh

Step 2 — Set your Tinker API key

bash
export TINKER_API_KEY="your_tinker_api_key_here"

Get your Tinker API key at thinkingmachines.ai/tinker. The key is free for development use.

Step 3 — Start EvoClaw

bash
# Basic RL training mode
python examples/run_conversation_rl.py

# With skills + evolution enabled
python examples/run_with_skills.py

You should see output like:

output
🦎 EvoClaw v0.1.0 starting...
   → Model:       moonshotai/Kimi-2.5
   → Proxy port:  30000
   → Tinker URL:  http://localhost:8080
   → Skills:      enabled (18 loaded)
   → Evolution:   enabled
   → Ready. Start chatting — your agent will begin evolving!
💡 Tip: Open a second terminal and start your OpenClaw interface. As you chat, watch the EvoClaw terminal — you'll see turns being scored, skills injected, and training happening in real time.

Prerequisites

System Architecture Overview

EvoClaw consists of five main components that work together asynchronously. They are fully decoupled — the proxy serves users while training runs in the background without blocking.

MODULEFILEROLE
EvoClawProxyproxy.pyOpenAI-compatible FastAPI server that intercepts all conversations
ConversationBufferbuffer.pyThread-safe async queue that accumulates turns until batch is full
RewardModelreward_model.pyCalls a judge LLM to score each turn on a 0.0–1.0 scale
SkillManagerskill_manager.pyLoads, retrieves, injects, and evolves skills
EvoClawTrainertrainer.pySubmits training jobs to Tinker and hot-swaps weights on completion

EvoClawProxy

The proxy is a FastAPI application that implements the OpenAI chat completions API. It sits between OpenClaw and your model, transparently intercepting every request.

python
from evoclaw import EvoClawConfig, EvoClawProxy
from evoclaw.buffer import ConversationBuffer
import asyncio

config = EvoClawConfig(proxy_port=30000)
buffer = ConversationBuffer(config)
proxy  = EvoClawProxy(config=config, buffer=buffer)

# Start the proxy server
asyncio.run(proxy.serve())

The proxy supports both streaming and non-streaming completions. Point your OpenClaw config to http://localhost:30000/v1.

ConversationBuffer

A thread-safe async FIFO queue. Every intercepted turn is added to the buffer. When the buffer reaches batch_size, it triggers a training cycle automatically.

python
from evoclaw.buffer import ConversationBuffer

buffer = ConversationBuffer(config)

# Add a turn manually (proxy does this automatically)
await buffer.add_turn(messages=messages, response=response)

# Check current size
print(buffer.size)   # e.g. 14 / 32

# Register a callback when buffer is ready
buffer.on_batch_ready(callback=trainer.train_step)

RewardModel (PRM)

The Process Reward Model scores every agent turn on a scale of 0.0 to 1.0. It calls a configurable judge endpoint — any OpenAI-compatible API works.

SCORE RANGEMEANINGACTION
0.7 – 1.0High qualityStrong positive gradient update
0.3 – 0.7AcceptableModerate gradient update
0.0 – 0.3Poor / failureWeak update + triggers Skill Evolution
python
from evoclaw.reward_model import RewardModel

prm = RewardModel(config)

# Score a single turn
score = await prm.score(
    messages=conversation_history,
    response=agent_response
)
print(score)  # 0.82

EvoClawTrainer

The trainer listens to the buffer and submits LoRA training jobs to Tinker cloud when a batch is ready. After training, it calls the hot-swap API to update the live sampling server.

python
from evoclaw import EvoClawTrainer

trainer = EvoClawTrainer(config=config, buffer=buffer)

# Run the training loop (runs indefinitely)
await trainer.run()

# Or run a single step manually
await trainer.train_step(batch=buffer.flush())

The Skill System

Skills are short Markdown instructions (typically 2–5 sentences) that guide agent behavior. They are stored in a JSON skill bank and retrieved at inference time based on relevance to the current conversation.

The default skill bank (memory_data/conversation/conversation_skills.json) ships with 18 skills across 5 categories:

CATEGORYSKILLSEXAMPLE
coding6"Write clean, documented, tested code. Prefer readability over cleverness."
security4"Validate all inputs. Never construct SQL queries with string interpolation."
agentic3"Plan before acting. List steps, identify dependencies, verify preconditions."
writing3"Be direct and concise. Lead with the answer, then provide supporting detail."
research2"Cite sources. Distinguish between facts and analysis clearly."

Skill Injection

At every turn, EvoClaw retrieves the top-K most relevant skills from the bank and injects them into the system prompt as a ### Agent Skills block. This happens before the model responds.

python
# Enable skill injection in config
config = EvoClawConfig(use_skills=True)

# The system prompt will be augmented like this:
"""
You are a helpful AI assistant.

### Agent Skills
- Write clean, documented, tested code.
- Validate all inputs before processing.
- Prefer iterative solutions over recursive ones.
"""

The injection is fully transparent — users never see the skill block, but the model does. The result is immediate behavior improvement without any retraining.

Skill Evolution

When a turn receives a reward score below 0.3, EvoClaw automatically triggers skill evolution:

  1. The full conversation trajectory is sent to the evolution LLM (configured via azure_openai_deployment)
  2. The LLM analyzes what the agent did wrong and what it should have done differently
  3. A new, targeted skill is generated and appended to the skill bank as a JSON entry
  4. The new skill is immediately available for injection in future turns
python
# Enable evolution in config
config = EvoClawConfig(
    use_skills=True,
    enable_skill_evolution=True,
    azure_openai_deployment="gpt-5.2",
    azure_openai_endpoint="https://YOUR-RESOURCE.openai.azure.com/",
)
⚠ Note: Skill evolution uses the azure_openai_deployment model to generate skills. This incurs API costs. You can set a min_reward_for_evolution threshold to control when evolution triggers.

Writing Custom Skills

You can add your own skills directly to the JSON bank. Each skill has a simple structure:

json
{
  "id": "skill_019",
  "category": "coding",
  "title": "Always write type hints in Python",
  "content": "When writing Python functions, always include type hints for all parameters and the return value. Use Optional[T] for values that may be None. Example: def process(data: list[str]) -> dict[str, int]:",
  "tags": ["python", "typing", "best-practices"],
  "enabled": true
}

EvoClawConfig Reference

All settings are passed as a single EvoClawConfig dataclass instance. Both EvoClawProxy and EvoClawTrainer accept the same config object.

FIELDTYPEDEFAULTDESCRIPTION
model_namestr"moonshotai/Kimi-2.5"Base model. Kimi-2.5 recommended for best results.
lora_rankint32LoRA rank for cloud fine-tuning. Higher = more capacity, slower training.
batch_sizeint32Number of turns to accumulate before triggering a training step.
max_stepsint1000Total training steps before the training loop stops.
loss_fnstr"importance_sampling"Loss function: "importance_sampling", "ppo", or "cispo".
use_prmboolTrueEnable Process Reward Model scoring for each turn.
prm_urlstr"https://api.openai.com/v1"Base URL for the judge LLM endpoint (OpenAI-compatible).
prm_modelstr"gpt-5.2"Model used for reward scoring / judging.
prm_api_keystrNoneAPI key for the PRM judge endpoint. Falls back to OPENAI_API_KEY env var.
use_skillsboolFalseEnable skill injection into system prompt at every turn.
skill_bank_pathstr"memory_data/..."Path to the JSON skill bank file.
top_k_skillsint3Number of skills to inject per turn.
enable_skill_evolutionboolFalseAuto-generate new skills when reward score falls below threshold.
evolution_thresholdfloat0.3Reward score below which skill evolution is triggered.
proxy_portint30000Port for the EvoClaw proxy server to listen on.
tinker_sampling_urlstr"http://localhost:8080"Tinker sampling endpoint for model serving.
tinker_api_keystrNoneTinker API key. Falls back to TINKER_API_KEY env var.
openclaw_env_data_dirstrNoneOptional path to JSONL tasks for programmatic (non-chat) rollout.
azure_openai_deploymentstrNoneAzure OpenAI deployment name for skill evolution LLM.
azure_openai_endpointstrNoneAzure OpenAI resource endpoint URL.
log_levelstr"INFO""DEBUG", "INFO", "WARNING", or "ERROR".

Learning Modes

Mode 1: Reinforcement Learning (GRPO)

Uses Group Relative Policy Optimization to update the model policy based on scored conversation turns. Best when you have clear task completion signals from the environment.

python
config = EvoClawConfig(
    loss_fn="importance_sampling",  # or "ppo", "cispo"
    use_prm=True,
)

Mode 2: On-Policy Distillation (OPD)

Leverages richer natural-language supervision from a teacher model. Best with a strong teacher (e.g. GPT-5) providing detailed feedback. Faster convergence and denser signal than pure RL.

python
config = EvoClawConfig(
    loss_fn="importance_sampling",
    use_prm=True,
    prm_model="gpt-5.2",  # Strong teacher
    prm_url="https://api.openai.com/v1",
)

Supported Models

MODELSIZEUSE CASESCRIPT
moonshotai/Kimi-2.5~200B MoERecommended. Best quality, long context, strong reasoning.openclaw_model_kimi.sh
Qwen/Qwen3-4B4BLightweight. Fast iteration, lower API costs, constrained environments.openclaw_model_qwen.sh
Any OpenAI-compatibleSet model_name to any Tinker-supported model identifier.

Discord Bot Integration NEW

Deploy your EvoClaw self-evolving agent directly to Discord. Every message from your community automatically becomes training data — the agent improves in real time.

1. Create a Discord Bot

  1. Go to discord.com/developers → New Application → Bot
  2. Copy the bot token
  3. Enable Message Content Intent under Bot → Privileged Gateway Intents
  4. Invite bot to server with Read Messages + Send Messages permissions

2. Install dependencies

pip install discord.py

3. Run the bot

# One command:
python -m evoclaw.bot discord \
  --token YOUR_DISCORD_TOKEN \
  --proxy http://localhost:8000 \
  --channel-id 123456789  # optional: restrict to specific channels

# Or via Python:
from evoclaw.bot import EvoBotDiscord

bot = EvoBotDiscord(
    discord_token="YOUR_DISCORD_TOKEN",
    evoclaw_url="http://localhost:8000",
    channel_ids=[123456789],  # empty = all channels
    system_prompt="You are a crypto trading assistant powered by EvoClaw.",
)
bot.run()

The bot responds to @mention or messages starting with !ask. All conversations are automatically scored and fed into the EvoClaw training loop.

Telegram Bot Integration NEW

Same self-evolving agent, deployed to Telegram. Ideal for crypto communities, trading groups, and DeFi support bots.

1. Create a Telegram Bot

  1. Open Telegram → search @BotFather → send /newbot
  2. Follow the prompts → copy the token

2. Install dependencies

pip install python-telegram-bot

3. Run the bot

# One command:
python -m evoclaw.bot telegram \
  --token YOUR_TELEGRAM_TOKEN \
  --proxy http://localhost:8000

# Or via Python:
from evoclaw.bot import EvoBotTelegram

bot = EvoBotTelegram(
    telegram_token="YOUR_TELEGRAM_TOKEN",
    evoclaw_url="http://localhost:8000",
    system_prompt="You are a DeFi research assistant powered by EvoClaw.",
    allowed_chats=[],  # empty = all chats
)
bot.run()

Skill Auto-Tag NEW

Automatically categorize and tag every skill in your skill bank using an LLM. No manual labeling required. Skills become searchable and composable across agent instances.

Domains

Each skill is tagged with one of: crypto, coding, research, agentic, security, communication, general

Usage

from evoclaw.skill_autotag import SkillAutoTagger

# Tag all skills in the skill bank
tagger = SkillAutoTagger(
    api_key="YOUR_OPENAI_KEY",  # or set OPENAI_API_KEY env var
    model="gpt-4o-mini",        # cheap + fast
)
tagger.tag_all("memory_data/conversation/conversation_skills.json")

# Search tagged skills by domain
crypto_skills = tagger.search("memory_data/conversation/conversation_skills.json", domain="crypto")
for skill in crypto_skills:
    print(skill["text"], "→", skill["tags"])

CLI

# Tag all skills
python -m evoclaw.skill_autotag memory_data/conversation/conversation_skills.json

# Re-tag existing (overwrite)
python -m evoclaw.skill_autotag memory_data/conversation/conversation_skills.json --overwrite

# Tag + search by domain
python -m evoclaw.skill_autotag memory_data/conversation/conversation_skills.json --search-domain crypto

Output format

Each skill in the JSON becomes:

{
  "text": "When analyzing DeFi protocols, always check TVL, audit history, and token distribution.",
  "domain": "crypto",
  "tags": ["defi", "risk-analysis", "protocol-audit"],
  "complexity": "intermediate",
  "use_case": "Use when evaluating DeFi investment opportunities or reviewing smart contract safety."
}

Deployment Guide

Production setup with systemd

bash
# /etc/systemd/system/evoclaw.service
[Unit]
Description=EvoClaw Self-Evolving Agent
After=network.target

[Service]
User=ubuntu
WorkingDirectory=/opt/evoclaw
Environment="TINKER_API_KEY=tk_xxxxxxxxxxxx"
ExecStart=/usr/bin/python3 examples/run_with_skills.py
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Environment variables

VARREQUIREDDESCRIPTION
TINKER_API_KEYYesTinker cloud training API key
OPENAI_API_KEYIf using OpenAI for PRMJudge LLM API key
AZURE_OPENAI_KEYIf using Azure for evolutionAzure OpenAI API key
EVOCLAW_LOG_LEVELNoDefaults to INFO

Troubleshooting

Proxy not starting

Make sure port 30000 is available. Change it with proxy_port in your config. Check that FastAPI and uvicorn are installed correctly.

Training not triggering

The trainer only fires when the buffer reaches batch_size (default 32). For testing, reduce it: EvoClawConfig(batch_size=4). Also verify your Tinker API key is set correctly.

PRM scoring failing

Verify prm_url is reachable and your API key (prm_api_key or OPENAI_API_KEY) is valid. You can test with use_prm=False for initial setup.

Skill evolution not triggering

Evolution only fires when enable_skill_evolution=True AND reward is below evolution_threshold (default 0.3). Check that azure_openai_deployment is set.

Frequently Asked Questions

Does EvoClaw work without a GPU?

Yes. The entire training pipeline runs on Tinker cloud. Your machine only needs network access. This is one of EvoClaw's core design principles.

How is EvoClaw different from MetaClaw?

EvoClaw builds on MetaClaw's concept but adds a proper Python package structure, full documentation, an interactive website, a streaming proxy, more default skills, better error handling, and a pip-installable setup. See the full comparison.

Can I use EvoClaw with models other than Kimi-2.5?

Yes. Any model supported by Tinker cloud can be used. Set model_name to the Tinker model identifier. Qwen3-4B is a good lightweight option.

Is my conversation data sent anywhere?

Training batches are sent to Tinker cloud for LoRA fine-tuning. The PRM judge endpoint receives conversation turns for scoring. No data is sent to EvoClaw's servers — EvoClaw is entirely open-source and runs on your infrastructure.

Can I disable training and just use skill injection?

Yes. Set use_skills=True and simply don't start the EvoClawTrainer. The proxy and skill injection will work independently.

Changelog

v0.1.0 — Initial Release
March 2026
  • Initial open-source release of EvoClaw
  • EvoClawProxy with streaming + non-streaming support
  • ConversationBuffer with async thread-safe queue
  • RewardModel with configurable judge endpoint
  • SkillManager with injection and evolution
  • EvoClawTrainer with Tinker cloud LoRA integration
  • 18 default skills across 5 categories
  • Kimi-2.5 and Qwen3-4B setup scripts
  • MIT license
v0.2.0 — Coming Soon
Planned
  • Web dashboard for real-time training metrics
  • Skill marketplace integration
  • Multi-agent support
  • Export trained adapters as standalone LoRA files

Ready to start? Your agent evolves with every conversation.