EvoClaw — Self-Evolving AI Agent

HOW IT WORKS

From conversation to trained model.

Five automated steps run in the background — no manual intervention, no restarts, no service interruption.

01

Intercept

Transparent OpenAI-compatible proxy captures every conversation turn with zero added latency.

02

Score

A Process Reward Model scores each turn via a judge LLM. High quality turns contribute more to training.

03

Inject Skills

Relevant skills are injected into the system prompt before each response. Immediate improvement, no retraining wait.

04

Cloud Training

Batch submits to Tinker cloud for LoRA fine-tuning. Runs remotely — zero GPU on your end.

05

Hot-Swap & Repeat

Updated weights swap into the live server. Zero downtime. The cycle continues automatically.

EVOCLAW RUNTIME LIVE

[PROXY]Turn intercepted from OpenClaw

[PRM]Scoring... reward = 0.84

[SKILLS]Injected: [coding, security]

[BUFFER]Batch 32/32 → sending to Tinker

[TRAIN]LoRA step 48 — loss: 0.0308

[SWAP]✓ Weights updated. Agent upgraded.

▶ Open Full Interactive Demo →

FEATURES

Everything your agent needs to grow.

No data team. No fine-tuning pipeline. EvoClaw handles the entire learning loop in the background.

🎯

Real Usage Training

Learns from live conversations — no synthetic datasets, no offline retraining. Continuously improving from actual deployment.

NO SYNTHETIC DATA

💉

Skill Injection

Retrieves relevant skill instructions and injects them into the system prompt each turn. Instant improvement without waiting for retraining.

INSTANT BOOST

🧬

Skill Evolution

When the agent fails, EvoClaw auto-generates a new skill from the failure trajectory using an LLM. Learns from its own mistakes.

SELF-IMPROVEMENT

☁️

No GPU Cluster

Training offloads to Tinker cloud. Any machine with network access runs the complete system — zero infrastructure overhead.

CLOUD-NATIVE

⚡

Fully Async

Serving, scoring, and training run as decoupled coroutines. Your agent responds in real time while learning happens in the background.

NON-BLOCKING

🔀

Dual Learning Modes

RL (GRPO) for implicit environment signals. On-Policy Distillation for richer language supervision. One config field to switch.

GRPO + OPD

LEARNING MODES

Two ways your agent gets smarter.

EvoClaw supports both lightweight signal learning and rich natural-language supervision — choose what fits your setup.

MODE 01

Reinforcement Learning (GRPO)

Uses Group Relative Policy Optimization. The agent learns from implicit feedback — every scored conversation turn updates the policy automatically.

# Lightweight — works with any signal
config = EvoClawConfig(
loss_fn="importance_sampling",
use_prm=True,
)

GRPO PPO CISPO

MODE 02

On-Policy Distillation (OPD)

Leverages richer natural-language supervision from a teacher model. Best when you have access to a strong judge LLM for high-quality textual feedback.

# High quality — needs judge model
config = EvoClawConfig(
use_prm=True,
prm_model="gpt-5.2",
)

TEACHER MODEL RICH FEEDBACK

SUPPORTED MODELS

Works with the models you already use.

EvoClaw is model-agnostic. Use Kimi-2.5 for maximum quality, Qwen3-4B for lightweight deployment, or any Groq/OpenAI-compatible endpoint.

🌙

RECOMMENDED

Kimi-2.5

~200B MoE

Best quality, long context, strong reasoning. Recommended for production.

moonshotai/Kimi-2.5

⚡

LIGHTWEIGHT

Qwen3-4B

4B params

Fast iteration, lower API costs. Great for development and constrained budgets.

Qwen/Qwen3-4B

🔌

COMPATIBLE

Any API

OpenAI-compatible

Groq, OpenAI, Anthropic, or any Tinker-supported endpoint. Plug and play.

llama · gpt · claude

CONFIGURATION

One config object. Full control.

All settings are passed as a single EvoClawConfig instance — no YAML files, no env sprawl.

FIELD	DEFAULT	DESCRIPTION
loss_fn	"importance_sampling"	RL loss: importance_sampling / ppo / cispo
use_prm	True	Enable PRM reward scoring per turn
use_skills	False	Inject skills into system prompt
batch_size	32	Turns before each training step
lora_rank	32	LoRA rank. Higher = more capacity
enable_skill_evolution	False	Auto-generate skills from failures
proxy_port	30000	Proxy listen port (default: 8080 in EvoClaw)

VIEW FULL CONFIG REFERENCE →

FULL EXAMPLE

from evoclaw import EvoClawConfig

config = EvoClawConfig(
  model_name="moonshotai/Kimi-2.5",
  loss_fn="importance_sampling",
  use_prm=True,
  use_skills=True,
  enable_skill_evolution=True,
  batch_size=32,
  lora_rank=32,
)

# That's it — start chatting
evoclaw start --config config

GET STARTED

Up and running in 5 minutes.

Tell us about your project and model stack — we'll send a personalized setup guide, a EvoClawConfig pre-filled for your scenario, and a starter skill bank curated for your use case.

📦

Personalized Setup Guide

Step-by-step walkthrough based on your model and use case, sent right after you submit.

⚙️

Ready-to-Use Config

EvoClawConfig pre-filled for your scenario — just add your API key.

🧬

Starter Skill Bank (25+ skills)

Curated skills for your domain — coding, security, research, or agentic workflows.

📊

Early Access Updates

First to know about new features and integrations. Unsubscribe anytime.

Read Docs → ▶ Live Demo

Get Your Free Setup Guide

Receive a personalized config + skill bank · No credit card · MIT licensed

FIRST NAME

LAST NAME

EMAIL ADDRESS

GITHUB USERNAME (OPTIONAL)

BASE MODEL

PRIMARY USE CASE

ABOUT YOUR PROJECT (OPTIONAL)

🦎

You're all set!

Your setup guide, config template, and starter skill bank are on their way. Check your inbox.

Read the Full Docs →

No spam · Unsubscribe anytime · Open source

LORA TRAINING

Cloud LoRA training — fully automatic.

Every conversation trains your model. EvoClaw batches turns, submits to Tinker cloud, and hot-swaps updated weights — all in the background. No GPU, no downtime, no manual steps.

☁️

No GPU Required

Training runs entirely on Tinker cloud. Any machine with network access can run the full pipeline.

CLOUD-NATIVE

🔄

Hot-Swap Weights

New LoRA weights replace old ones automatically after each step. Zero downtime, zero restarts.

ZERO DOWNTIME

🌙

Kimi-K2.5

Same model as MetaClaw. ~200B MoE, best reasoning and long context. $4.40/M tokens on Tinker.

SAME AS METACLAW

⚡

Qwen3-4B Free

Lightweight alternative on Tinker free tier. Great for development and constrained budgets.

FREE TIER

📊

GRPO + OPD

Two learning modes — Reinforcement Learning (GRPO) or On-Policy Distillation. One config field.

DUAL MODE

💰

Pay Per Use

No subscription. Top up Tinker balance and pay only for actual training compute. Start from $5.

FROM $5

HOW TO USE LORA

Set up in 3 steps.

01

Get your Tinker API Key

Sign up at tinker-console.thinkingmachines.ai → API Keys → Create API Key. Copy the key starting with tm1-.... Top up at least $5 under Billing → Add to balance.

02

Run evoclaw init

Run evoclaw init in terminal. Paste your Groq key (free at console.groq.com), then your Tinker key. Choose model 3 — Kimi-K2.5 for best results, or model 1 — Qwen3-4B for free tier.

03

Start proxy — LoRA trains automatically

Run evoclaw start. You will see Tinker: ✅ connected. Point your OpenAI client to http://localhost:8080/v1 and chat normally. Every 32 conversations, EvoClaw auto-submits a LoRA job and hot-swaps the new weights.

Get Tinker API Key → Full Install Guide → LoRA Docs →

QUICK START

Running in 3 commands.

No GPU, no cluster, no data team. Install, configure, and start — EvoClaw handles the rest.

TERMINAL

# 1. Install EvoClaw

pip install evoclaw

# 2. Setup API keys (Groq = free)

evoclaw init

# 3. Start the proxy

evoclaw start

EvoClaw Proxy v0.2.1 — localhost:8080 — evolving!

Full Documentation → ▶ Live Demo ⭐ GitHub 𝕏 Follow