INTERACTIVE SIMULATION

Watch EvoClaw learn in real time.

This is a browser simulation of exactly what happens inside EvoClaw when your agent receives a message. Pick a scenario, press Play, and watch all 6 stages run live — intercept, scoring, skill injection, cloud training, skill evolution, and model weight hot-swap.

💬 Intercept conversation 📊 PRM reward scoring 💉 Skill injection ☁️ Cloud LoRA training 🧬 Auto skill evolution 🔄 Hot-swap weights
HOW TO USE:
1 Select a scenario below
2 Click ▶ Play Demo
3 All panels light up in sequence — simulation loops automatically
1
Intercept
2
Score
3
Skills
4
Train
5
Evolve
6
Hot-Swap
OPENCLAW CONVERSATION
A real conversation between user and agent. EvoClaw intercepts every message transparently with zero added latency.
INTERCEPTED
PROCESS REWARD MODEL
A judge LLM rates every response 0.0–1.0. Score < 0.3 = failure → triggers automatic Skill Evolution.
WAITING
REWARD SCORE (0.0 → 1.0)
Judge model has not scored this turn yet.
SKILL BANK — INJECTION
The most relevant skills are injected into the system prompt before the agent replies. Instant improvement — no retraining wait.
READY
Waiting for conversation turn to retrieve relevant skills...
TINKER CLOUD TRAINING
Once 32 turns accumulate, the batch is sent to Tinker cloud for LoRA fine-tuning. Training runs remotely — zero GPU on your end.
IDLE
0STEP
LOSS
0/32BATCH
00:00Trainer initialized — waiting for batch...
🧬 SKILL EVOLUTION ENGINE
When reward < 0.3, EvoClaw analyzes the failure trajectory and uses an LLM to auto-generate a new skill.
WATCHING FOR FAILURES
When the agent fails (reward < 0.3), EvoClaw will analyze the trajectory and auto-generate a new skill here...
Cycle: 0

WHAT YOU'RE SEEING

Each panel explained.

💬
OpenClaw Conversation

This simulates a real conversation between a user and the OpenClaw agent. EvoClaw intercepts every message and response through its transparent proxy — zero extra latency added to the user.

📊
Process Reward Model

After each response, a judge LLM rates it on a 0.0–1.0 scale. Scores above 0.7 are good. Below 0.3 triggers Skill Evolution. These scores weight how much each turn influences the gradient update.

💉
Skill Bank Injection

EvoClaw retrieves the most relevant skills from its bank based on the conversation content and injects them into the system prompt. The agent immediately becomes more capable — before any retraining.

Tinker Cloud Training

When the buffer fills (32 turns by default), EvoClaw submits a cloud LoRA training job to Tinker. The training runs remotely — your machine just sends the data and receives updated weights.

🧬
Skill Evolution

When the agent fails (reward < 0.3), EvoClaw sends the full trajectory to a skill-generation LLM. It analyzes what went wrong and creates a new, targeted skill that gets added to the bank permanently.

🔄
Hot-Swap

Once training completes, updated LoRA weights are pushed directly to the Tinker sampling endpoint and swapped in with zero service interruption. The cycle repeats automatically.

Read Full Docs → 🚀 Get Started →
READY TO TRY IT REAL?

Talk to the live EvoClaw agent

The simulation above shows how EvoClaw works. Now talk to the real agent — it learns from every message and remembers across sessions.

ASK EVOCLAW →