Claude Skill
suyoumo/ClawProBench
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.
Overview
Repository
🚀 Install this Skill
openclaw install suyoumo/ClawProBenchSummary
ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime, featuring deterministic grading and repeated-trial reliability.
ClawProBench 是一个以实时执行为优先的基准测试框架,用于在 OpenClaw 运行时环境中评估 LLM 代理,具备确定性评分和重复试验可靠性。
Key features
- Live-first benchmark harness for LLM agents
- Runs inside the OpenClaw runtime environment
- Deterministic grading for consistent evaluation
- Repeated-trial reliability measurement
- Supports leaderboard-style comparisons
Use cases
- Evaluating LLM agent performance in live environments
- Comparing agent reliability across repeated trials
- Benchmarking agents for OpenClaw-based applications
- Building reproducible agent evaluation pipelines
- Tracking agent improvements via leaderboard metrics