Topics: benchmark

Browse Claude Skill projects under the "benchmark" topic.

Language

suyoumo/ClawProBench

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

⭐ 809🍴 52Python

agent benchmark evaluation

LeoYeAI/myclaw-bench

The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai

⭐ 232🍴 39Python

agent-testing ai-agent ai-benchmark

Showing 2/2