What are the key features of suyoumo/ClawProBench?

Live-first benchmark harness for LLM agents; Runs inside the OpenClaw runtime environment; Deterministic grading for consistent evaluation; Repeated-trial reliability measurement; Supports leaderboard-style comparisons

What are the use cases of suyoumo/ClawProBench?

Evaluating LLM agent performance in live environments; Comparing agent reliability across repeated trials; Benchmarking agents for OpenClaw-based applications; Building reproducible agent evaluation pipelines; Tracking agent improvements via leaderboard metrics

What programming language does suyoumo/ClawProBench use?

suyoumo/ClawProBench is primarily written in Python.

How to install suyoumo/ClawProBench?

Run: openclaw install suyoumo/ClawProBench

Claude Skill

suyoumo/ClawProBench

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

Language

Overview

Stars555

Forks48

LanguagePython

Last pushed2026-04-28

Last synced2026-04-28

View on GitHub

Repository

Ownersuyoumo

RepositoryClawProBench

Full namesuyoumo/ClawProBench

Repo ID941,429,098

GitHub URLhttps://github.com/suyoumo/ClawProBench

🚀 Install this Skill

openclaw install suyoumo/ClawProBench

⭐ GitHub

Summary

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime, featuring deterministic grading and repeated-trial reliability.

Chinese description

ClawProBench 是一个以实时执行为优先的基准测试框架，用于在 OpenClaw 运行时环境中评估 LLM 代理，具备确定性评分和重复试验可靠性。

Key features

Live-first benchmark harness for LLM agents
Runs inside the OpenClaw runtime environment
Deterministic grading for consistent evaluation
Repeated-trial reliability measurement
Supports leaderboard-style comparisons

Use cases

Evaluating LLM agent performance in live environments
Comparing agent reliability across repeated trials
Benchmarking agents for OpenClaw-based applications
Building reproducible agent evaluation pipelines
Tracking agent improvements via leaderboard metrics

Topics

agent benchmark evaluation harness leaderboard llm openclaw

suyoumo/ClawProBench

Overview

Repository

🚀 Install this Skill

Summary

Key features

Use cases

Topics

Explore more

Related skills

infiniflow/ragflow

volcengine/OpenViking

icip-cas/PPTAgent

dataelement/Clawith

aiming-lab/MetaClaw

casdoor/casdoor