DOCUMENTATION

EvoClaw Developer Guide

EvoClaw is a self-evolving AI agent wrapper that turns live OpenClaw conversations into continuous training data. It wraps your model behind an OpenAI-compatible API proxy, intercepts every turn, scores it, injects skills, trains via cloud LoRA on Tinker, and hot-swaps weights — all while your users are typing.

📦 Version 0.2.1 — EvoClaw is in active development. The API is stable but may have minor breaking changes before v1.0. See the changelog for updates.

🚀 New to EvoClaw?

Start with the Quick Start guide — you'll have a running agent in 5 minutes.

⚙️ Already set up?

Jump to the Config Reference or Skills Guide.

Installation

EvoClaw requires Python 3.9+ and a valid Tinker API key for cloud training. Install all dependencies with pip:

bash

pip install evoclaw

Optional: Verify installation

python
import evoclaw
print(evoclaw.__version__)  # 0.2.1

Quick Start

The fastest way to get EvoClaw running is with the included example scripts. Three commands and your agent is live and evolving.

Step 1 — Configure OpenClaw gateway

Run the setup script for your chosen model. This configures OpenClaw to route through the EvoClaw proxy on port 30000.

bash

# Recommended: Kimi-2.5 (~200B MoE)
evoclaw init

Step 2 — Set your Tinker API key

bash

export TINKER_API_KEY="your_tinker_api_key_here"

Get your Tinker API key at thinkingmachines.ai/tinker. The key is free for development use.

Step 3 — Start EvoClaw

bash

# Basic RL training mode
evoclaw start

You should see output like:

output
EvoClaw v0.2.1 starting...
   → Model:       moonshotai/Kimi-2.5
   → Proxy port:  30000
   → Tinker URL:  http://localhost:8080
   → Skills:      enabled (18 loaded)
   → Evolution:   enabled
   → Ready. Start chatting — your agent will begin evolving!

💡 Tip: Open a second terminal and start your OpenClaw interface. As you chat, watch the EvoClaw terminal — you'll see turns being scored, skills injected, and training happening in real time.

Prerequisites

Python 3.9+ — EvoClaw uses modern async/await and type hints throughout
OpenClaw — The agent framework EvoClaw wraps. See openclaw.ai
Tinker API key — For cloud LoRA training. Free tier available at thinkingmachines.ai/tinker
OpenAI-compatible judge endpoint — For PRM scoring (e.g. Azure OpenAI, any OpenAI-compatible API)
Network access — That's it. No GPU, no local model server, no cluster.

System Architecture Overview

EvoClaw consists of five main components that work together asynchronously. They are fully decoupled — the proxy serves users while training runs in the background without blocking.

MODULE	FILE	ROLE
EvoClawProxy	proxy.py	OpenAI-compatible FastAPI server that intercepts all conversations
ConversationBuffer	buffer.py	Thread-safe async queue that accumulates turns until batch is full
RewardModel	reward_model.py	Calls a judge LLM to score each turn on a 0.0–1.0 scale
SkillManager	skill_manager.py	Loads, retrieves, injects, and evolves skills
EvoClawTrainer	trainer.py	Submits training jobs to Tinker and hot-swaps weights on completion

EvoClawProxy

The proxy is a FastAPI application that implements the OpenAI chat completions API. It sits between OpenClaw and your model, transparently intercepting every request.

python
from evoclaw import EvoClawConfig, EvoClawProxy
from evoclaw.buffer import ConversationBuffer
import asyncio

config = EvoClawConfig(proxy_port=30000)
buffer = ConversationBuffer(config)
proxy  = EvoClawProxy(config=config, buffer=buffer)

# Start the proxy server
asyncio.run(proxy.serve())

The proxy supports both streaming and non-streaming completions. Point your OpenClaw config to http://localhost:30000/v1.

ConversationBuffer

A thread-safe async FIFO queue. Every intercepted turn is added to the buffer. When the buffer reaches batch_size, it triggers a training cycle automatically.

python
from evoclaw.buffer import ConversationBuffer

buffer = ConversationBuffer(config)

# Add a turn manually (proxy does this automatically)
await buffer.add_turn(messages=messages, response=response)

# Check current size
print(buffer.size)   # e.g. 14 / 32

# Register a callback when buffer is ready
buffer.on_batch_ready(callback=trainer.train_step)

RewardModel (PRM)

The Process Reward Model scores every agent turn on a scale of 0.0 to 1.0. It calls a configurable judge endpoint — any OpenAI-compatible API works.

SCORE RANGE	MEANING	ACTION
0.7 – 1.0	High quality	Strong positive gradient update
0.3 – 0.7	Acceptable	Moderate gradient update
0.0 – 0.3	Poor / failure	Weak update + triggers Skill Evolution

python
from evoclaw.reward_model import RewardModel

prm = RewardModel(config)

# Score a single turn
score = await prm.score(
    messages=conversation_history,
    response=agent_response
)
print(score)  # 0.82

EvoClawTrainer

The trainer listens to the buffer and submits LoRA training jobs to Tinker cloud when a batch is ready. After training, it calls the hot-swap API to update the live sampling server.

python
from evoclaw import EvoClawTrainer

trainer = EvoClawTrainer(config=config, buffer=buffer)

# Run the training loop (runs indefinitely)
await trainer.run()

# Or run a single step manually
await trainer.train_step(batch=buffer.flush())

The Skill System

Skills are short Markdown instructions (typically 2–5 sentences) that guide agent behavior. They are stored in a JSON skill bank and retrieved at inference time based on relevance to the current conversation.

The default skill bank (memory_data/conversation/conversation_skills.json) ships with 18 skills across 5 categories:

CATEGORY	SKILLS	EXAMPLE
coding	6	"Write clean, documented, tested code. Prefer readability over cleverness."
security	4	"Validate all inputs. Never construct SQL queries with string interpolation."
agentic	3	"Plan before acting. List steps, identify dependencies, verify preconditions."
writing	3	"Be direct and concise. Lead with the answer, then provide supporting detail."
research	2	"Cite sources. Distinguish between facts and analysis clearly."

Skill Injection

At every turn, EvoClaw retrieves the top-K most relevant skills from the bank and injects them into the system prompt as a ### Agent Skills block. This happens before the model responds.

python
# Enable skill injection in config
config = EvoClawConfig(use_skills=True)

# The system prompt will be augmented like this:
"""
You are a helpful AI assistant.

### Agent Skills
- Write clean, documented, tested code.
- Validate all inputs before processing.
- Prefer iterative solutions over recursive ones.
"""

The injection is fully transparent — users never see the skill block, but the model does. The result is immediate behavior improvement without any retraining.

Skill Evolution

When a turn receives a reward score below 0.3, EvoClaw automatically triggers skill evolution:

The full conversation trajectory is sent to the evolution LLM (configured via azure_openai_deployment)
The LLM analyzes what the agent did wrong and what it should have done differently
A new, targeted skill is generated and appended to the skill bank as a JSON entry
The new skill is immediately available for injection in future turns

python
# Enable evolution in config
config = EvoClawConfig(
    use_skills=True,
    enable_skill_evolution=True,
    azure_openai_deployment="gpt-5.2",
    azure_openai_endpoint="https://YOUR-RESOURCE.openai.azure.com/",
)

⚠ Note: Skill evolution uses the azure_openai_deployment model to generate skills. This incurs API costs. You can set a min_reward_for_evolution threshold to control when evolution triggers.

Writing Custom Skills

You can add your own skills directly to the JSON bank. Each skill has a simple structure:

json
{
  "id": "skill_019",
  "category": "coding",
  "title": "Always write type hints in Python",
  "content": "When writing Python functions, always include type hints for all parameters and the return value. Use Optional[T] for values that may be None. Example: def process(data: list[str]) -> dict[str, int]:",
  "tags": ["python", "typing", "best-practices"],
  "enabled": true
}

EvoClawConfig Reference

All settings are passed as a single EvoClawConfig dataclass instance. Both EvoClawProxy and EvoClawTrainer accept the same config object.

FIELD	TYPE	DEFAULT	DESCRIPTION
model_name	str	"moonshotai/Kimi-2.5"	Base model. Kimi-2.5 recommended for best results.
lora_rank	int	32	LoRA rank for cloud fine-tuning. Higher = more capacity, slower training.
batch_size	int	32	Number of turns to accumulate before triggering a training step.
max_steps	int	1000	Total training steps before the training loop stops.
loss_fn	str	"importance_sampling"	Loss function: "importance_sampling", "ppo", or "cispo".
use_prm	bool	True	Enable Process Reward Model scoring for each turn.
prm_url	str	"https://api.openai.com/v1"	Base URL for the judge LLM endpoint (OpenAI-compatible).
prm_model	str	"gpt-5.2"	Model used for reward scoring / judging.
prm_api_key	str	None	API key for the PRM judge endpoint. Falls back to OPENAI_API_KEY env var.
use_skills	bool	False	Enable skill injection into system prompt at every turn.
skill_bank_path	str	"memory_data/..."	Path to the JSON skill bank file.
top_k_skills	int	3	Number of skills to inject per turn.
enable_skill_evolution	bool	False	Auto-generate new skills when reward score falls below threshold.
evolution_threshold	float	0.3	Reward score below which skill evolution is triggered.
proxy_port	int	30000	Port for the EvoClaw proxy server to listen on.
tinker_sampling_url	str	"http://localhost:8080"	Tinker sampling endpoint for model serving.
tinker_api_key	str	None	Tinker API key. Falls back to TINKER_API_KEY env var.
openclaw_env_data_dir	str	None	Optional path to JSONL tasks for programmatic (non-chat) rollout.
azure_openai_deployment	str	None	Azure OpenAI deployment name for skill evolution LLM.
azure_openai_endpoint	str	None	Azure OpenAI resource endpoint URL.
log_level	str	"INFO"	"DEBUG", "INFO", "WARNING", or "ERROR".

Learning Modes

Mode 1: Reinforcement Learning (GRPO)

Uses Group Relative Policy Optimization to update the model policy based on scored conversation turns. Best when you have clear task completion signals from the environment.

python
config = EvoClawConfig(
    loss_fn="importance_sampling",  # or "ppo", "cispo"
    use_prm=True,
)

Mode 2: On-Policy Distillation (OPD)

Leverages richer natural-language supervision from a teacher model. Best with a strong teacher (e.g. GPT-5) providing detailed feedback. Faster convergence and denser signal than pure RL.

python
config = EvoClawConfig(
    loss_fn="importance_sampling",
    use_prm=True,
    prm_model="gpt-5.2",  # Strong teacher
    prm_url="https://api.openai.com/v1",
)

Supported Models

MODEL	SIZE	USE CASE	SCRIPT
moonshotai/Kimi-2.5	~200B MoE	Recommended. Best quality, long context, strong reasoning.	evoclaw init
Qwen/Qwen3-4B	4B	Lightweight. Fast iteration, lower API costs, constrained environments.	evoclaw init
Any OpenAI-compatible	—	Set model_name to any Tinker-supported model identifier.	—

Deployment Guide

Production setup with systemd

bash

# /etc/systemd/system/evoclaw.service
[Unit]
Description=EvoClaw Self-Evolving Agent
After=network.target

[Service]
User=ubuntu
WorkingDirectory=/opt/evoclaw
Environment="TINKER_API_KEY=your_key_here"
ExecStart=/usr/bin/evoclaw start
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Environment variables

VAR	REQUIRED	DESCRIPTION
TINKER_API_KEY	Yes	Tinker cloud training API key
OPENAI_API_KEY	If using OpenAI for PRM	Judge LLM API key
AZURE_OPENAI_KEY	If using Azure for evolution	Azure OpenAI API key
EVOCLAW_LOG_LEVEL	No	Defaults to INFO

Troubleshooting

Proxy not starting

Make sure port 30000 is available. Change it with proxy_port in your config. Check that FastAPI and uvicorn are installed correctly.

Training not triggering

The trainer only fires when the buffer reaches batch_size (default 32). For testing, reduce it: EvoClawConfig(batch_size=4). Also verify your Tinker API key is set correctly.

PRM scoring failing

Verify prm_url is reachable and your API key (prm_api_key or OPENAI_API_KEY) is valid. You can test with use_prm=False for initial setup.

Skill evolution not triggering

Evolution only fires when enable_skill_evolution=True AND reward is below evolution_threshold (default 0.3). Check that azure_openai_deployment is set.

Frequently Asked Questions

Does EvoClaw work without a GPU?

Yes. The entire training pipeline runs on Tinker cloud. Your machine only needs network access. This is one of EvoClaw's core design principles.

How is EvoClaw different from MetaClaw?

EvoClaw builds on MetaClaw's concept but adds a proper Python package structure, full documentation, an interactive website, a streaming proxy, more default skills, better error handling, and a pip-installable setup. See the full comparison.

Can I use EvoClaw with models other than Kimi-2.5?

Yes. Any model supported by Tinker cloud can be used. Set model_name to the Tinker model identifier. Qwen3-4B is a good lightweight option.

Is my conversation data sent anywhere?

Training batches are sent to Tinker cloud for LoRA fine-tuning. The PRM judge endpoint receives conversation turns for scoring. No data is sent to EvoClaw's servers — EvoClaw is entirely open-source and runs on your infrastructure.

Can I disable training and just use skill injection?

Yes. Set use_skills=True and simply don't start the EvoClawTrainer. The proxy and skill injection will work independently.

Changelog

v0.2.1 — Initial Release

March 2026

Initial open-source release of EvoClaw
EvoClawProxy with streaming + non-streaming support
ConversationBuffer with async thread-safe queue
RewardModel with configurable judge endpoint
SkillManager with injection and evolution
EvoClawTrainer with Tinker cloud LoRA integration
18 default skills across 5 categories
Kimi-2.5 and Qwen3-4B setup scripts
MIT license

v0.2.0 — Coming Soon

Planned

Web dashboard for real-time training metrics
Skill marketplace integration
Multi-agent support
Export trained adapters as standalone LoRA files

Ready to start? Your agent evolves with every conversation.

🚀 Get Started Free → ▶ Try Live Demo

LoRA Training

EvoClaw supports cloud-based LoRA fine-tuning via Tinker. No GPU required — training runs entirely on Tinker's infrastructure.

Supported Models

Model	Size	Notes
moonshotai/Kimi-K2.5	~200B MoE	Best quality. Recommended.
Qwen/Qwen3-4B	4B	Fast, low cost.
meta-llama/Llama-3.1-8B	8B	Balanced speed and quality.

Setup

Get a Tinker API key at thinkingmachines.ai/tinker, then run:

bash

evoclaw init

Select your model and paste your Tinker API key when prompted. Config is saved to ~/.evoclaw/config.json.

How It Works

Conversations are buffered

Every turn through the proxy is captured by ConversationBuffer and scored by the PRM.

Training triggers automatically

When the buffer reaches train_every_n conversations, EvoClaw sends the dataset to Tinker.

Weights hot-swapped live

New LoRA weights are applied without restarting the proxy. Zero downtime, continuous improvement.

Start Training

bash

evoclaw start

Config Reference

json
{
  "tinker_api_key": "your_key_here",
  "model": "moonshotai/Kimi-K2.5",
  "lora_rank": 32,
  "train_every_n": 10,
  "auto_train": true
}

Get Tinker API Key → Full Install Guide