Claude Skill
mgechev/skillgrade
Skillgrade 是一个 TypeScript 评估框架,用于为 AI 智能体技能编写单元测试。支持 Claude Code、Codex 和 Gemini CLI,帮助捕获回归问题并确保技能可靠性。
概览
仓库信息
安装这个 Skill
npx skills add mgechev/skills-best-practicesRegistry 信息
项目简介
Skillgrade 是一个基于 TypeScript 的评估框架,允许你为 AI 智能体技能编写“单元测试”,支持 Claude Code、Codex 和 Gemini CLI。它帮助你验证技能行为、捕获回归问题,并在部署智能体能力之前确保可靠性。
"Unit tests" for your agent skills
要点
- 为智能体技能编写类似单元测试的评估
- 支持 Claude Code、Codex 和 Gemini CLI
- 捕获回归问题并验证技能行为
- 基于 TypeScript 的轻量级框架
- 专为智能体技能可靠性设计
使用场景
- 在部署前验证自定义 Claude Code 技能
- 对智能体技能更新运行回归测试
- 确保不同 CLI 智能体之间行为一致
- 将技能测试集成到 CI/CD 流水线中
- 跨多个智能体平台基准测试技能性能
README 摘要
# Skillgrade The easiest way to evaluate your [Agent Skills](https://agentskills.io/home). Tests that AI agents correctly discover and use your skills. See [examples/](examples/) — [superlint](examples/superlint/) (simple) and [angular-modern](examples/angular-modern/) (TypeScript grader).  ## Quick Start **Prerequisites**: Node.js 20+, Docker ```bash npm i -g skillgrade ``` **1. Initialize** — go to your skill directory (must have `SKILL.md`) and scaffold: ```bash cd my-skill/ GEMINI_API_KEY=your-key skillgrade init # or ANTHROPIC_API_KEY / OPENAI_API_KEY # Use --force to overwrite an existing eval.yaml ``` Generates `eval.yaml` with AI-powered tasks and graders. Without an API key, creates a well-commented template. **2. Edit** — customize `eval.yaml` for your skill (see [eval.yaml Reference](#evalyaml-reference)). **3. Run**: ```bash GEMINI_API_KEY=your-key skillgrade --smoke ``` The agent is auto-detected from your API key: `GEMINI_API_KEY` → Gemini, `ANTHROPIC_API_KEY` → Claude, `OPENAI_API_KEY` → Codex. Override with `--agent=claude`. **4. Review**: ```bash skillgrade preview # CLI report skillgrade preview browser # web UI → http://localhost:3847 ``` Reports are saved to `$TMPDIR/skillgrade/<skill-name>/results/`. Override with `--output=DIR`. ## Presets | Flag | Trials | Use Case | |------|--------|----------| | `--smoke` | 5 | Quick capability check | | `--reliable` | 15 | Reliable pass rate estimate | | `--regression` | 30 | High-confidence regression detection | ## Options | Flag | Description | |------|-------------| | `--eval=NAME[,NAME]` | Run specific evals by name (comma-separated) | | `--grader=TYPE` | Run only graders