Claude Skill

suyoumo/ClawProBench

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

Overview

Stars555
Forks48
LanguagePython
Last pushed2026-04-28
Last synced2026-04-28
View on GitHub

Repository

Ownersuyoumo
RepositoryClawProBench
Full namesuyoumo/ClawProBench
Repo ID941,429,098

🚀 Install this Skill

openclaw install suyoumo/ClawProBench

Summary

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime, featuring deterministic grading and repeated-trial reliability.

Chinese description

ClawProBench 是一个以实时执行为优先的基准测试框架,用于在 OpenClaw 运行时环境中评估 LLM 代理,具备确定性评分和重复试验可靠性。

Key features

  • Live-first benchmark harness for LLM agents
  • Runs inside the OpenClaw runtime environment
  • Deterministic grading for consistent evaluation
  • Repeated-trial reliability measurement
  • Supports leaderboard-style comparisons

Use cases

  • Evaluating LLM agent performance in live environments
  • Comparing agent reliability across repeated trials
  • Benchmarking agents for OpenClaw-based applications
  • Building reproducible agent evaluation pipelines
  • Tracking agent improvements via leaderboard metrics

Topics

Explore more

Data from GitHub. Synced on 2026-04-28