suyoumo/ClawProBench 有哪些主要特性？

面向 LLM 代理的实时优先基准测试框架; 在 OpenClaw 运行时环境中运行; 确定性评分确保评估一致性; 重复试验可靠性测量; 支持排行榜式比较

suyoumo/ClawProBench 有哪些使用场景？

在实时环境中评估 LLM 代理性能; 比较代理在重复试验中的可靠性; 为基于 OpenClaw 的应用进行代理基准测试; 构建可复现的代理评估流水线; 通过排行榜指标追踪代理改进

suyoumo/ClawProBench 使用什么编程语言？

suyoumo/ClawProBench 主要使用 Python 编写。

如何安装 suyoumo/ClawProBench？

运行命令：openclaw install suyoumo/ClawProBench

Claude Skill

suyoumo/ClawProBench

ClawProBench 是一个以实时执行为优先的基准测试框架，用于在 OpenClaw 运行时环境中评估 LLM 代理，具备确定性评分和重复试验可靠性。

语言

概览

Stars719

Forks51

语言Python

最后更新2026-06-08

最近同步2026-06-17

前往 GitHub

仓库信息

拥有者suyoumo

仓库ClawProBench

完整名称suyoumo/ClawProBench

Repo ID941,429,098

GitHub 地址https://github.com/suyoumo/ClawProBench

安装这个 Skill

pip install uv

GitHub

Registry 信息

类型openclaw_skill

质量分80/100

验证状态readme_parsed

最近验证2026-06-07

平台

ClaudeOpenClawCodex

能力

pdfmemorysearchimageterminalagentbenchmarkevaluationharnessleaderboard

识别文件

README.mdREADME.zh-CN.mddocsrequirements.txttests

项目简介

ClawProBench 是一个以实时执行为优先的基准测试框架，用于在 OpenClaw 运行时环境中评估 LLM 代理，具备确定性评分和重复试验可靠性。

英文描述

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

要点

面向 LLM 代理的实时优先基准测试框架
在 OpenClaw 运行时环境中运行
确定性评分确保评估一致性
重复试验可靠性测量
支持排行榜式比较

使用场景

在实时环境中评估 LLM 代理性能
比较代理在重复试验中的可靠性
为基于 OpenClaw 的应用进行代理基准测试
构建可复现的代理评估流水线
通过排行榜指标追踪代理改进

README 摘要

<div align="center"> <img src="docs/assets/openclawprobench-logo.svg" width="160" alt="ClawProBench Logo"> # ClawProBench [![Active Scenarios](https://img.shields.io/badge/active-102-blue)](#benchmark-profiles) [![Catalog](https://img.shields.io/badge/catalog-162-green)](#benchmark-profiles) [![Core Profile](https://img.shields.io/badge/core-26-orange)](#benchmark-profiles) [![Execution](https://img.shields.io/badge/execution-live--first-black)](#quick-start) [![License](https://img.shields.io/badge/license-Apache%202.0-red)](LICENSE) > Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. <br> > 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. </div> <p> <a href="README.zh-CN.md"><strong>简体中文 README</strong></a> </p> ClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the `core` profile; broader active coverage remains available through `intelligence`, `coverage`, `native`, and `full`. The current worktree inventory reports `102` active scenarios and `162` total catalog scenarios (`60` incubating) via `python3 run.py inventory --json` and `python3 run.py inventory --benchmark-status all --json`. ## Leaderboard Browse the public leaderboard and benchmark cases at **[suyoumo.github.io/bench](https://suyoumo.github.io/bench/)**. [![ClawProBench leaderboard preview](docs/assets/leaderboard-preview-20260426.png)](https://suyoumo.github.io/bench/) [![ClawProBench ModelPK preview](docs/assets/modelpk.png)](https://suyoumo.github.io/bench/modelpk/) ## Get Involved We sincerely thank friends from Kimi and Qwen for their helpful feedback and improvement suggestions

话题

agent benchmark evaluation harness leaderboard llm openclaw

suyoumo/ClawProBench

概览

仓库信息

安装这个 Skill

Registry 信息

项目简介

要点

使用场景

README 摘要

话题

探索更多

相关技能

infiniflow/ragflow

the-open-agent/openagent

zhayujie/CowAgent

volcengine/OpenViking

icip-cas/PPTAgent

opensquilla/opensquilla