Topics: evaluation

Browse Claude Skill projects under the "evaluation" topic.

Language

comet-ml/opik-openclaw

🦞 Official plugin for OpenClaw that exports agent traces to Opik. See and monitor agent behaviour, cost, tokens, errors and more.

⭐ 571🍴 65TypeScript

clawdbot evaluation moltbot

suyoumo/ClawProBench

ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.

⭐ 555🍴 48Python

agent benchmark evaluation

yaojingang/yao-meta-skill

YAO = Yielding AI Outcomes. A rigorous engineering, evaluation, governance, and portability system for reusable agent skills.

⭐ 312🍴 40Python

agent-skills ai-agents evaluation

Showing 3/3