Claude Skill
darkrishabh/agent-skills-eval
A TypeScript CLI tool for evaluating AI agent skills in agentskills.io format. Supports JSONL/YAML tests and OpenAI-compatible LLM evals.
Overview
Repository
🚀 Install this Skill
openclaw install darkrishabh/agent-skills-evalSummary
A TypeScript-based test runner for evaluating AI agent skills in the agentskills.io format. It supports CLI usage, JSONL and YAML test definitions, and OpenAI-compatible LLM evaluations.
agentskills.io 风格 AI 代理技能的测试运行器
Key features
- Runs agent skill evaluations from agentskills.io-style definitions
- Supports JSONL and YAML test file formats
- OpenAI-compatible LLM evaluation integration
- Command-line interface (CLI) for easy automation
- Built with TypeScript for type safety and reliability
Use cases
- Evaluating AI agent performance on standardized skill tests
- Automating LLM evaluation pipelines in CI/CD workflows
- Benchmarking different AI agents against agentskills.io tasks
- Developing and testing new agent skills with reproducible evals