Claude Skill
mgechev/skillgrade
Skillgrade is a TypeScript evaluation framework for writing unit tests for AI agent skills. Supports Claude Code, Codex, and Gemini CLI to catch regressions and ensure skill reliability.
Overview
Repository
Install this Skill
npx skills add mgechev/skills-best-practicesRegistry
Summary
Skillgrade is a TypeScript-based evaluation framework that lets you write "unit tests" for your AI agent skills, supporting Claude Code, Codex, and Gemini CLI. It helps you validate skill behavior, catch regressions, and ensure reliability before deploying agent capabilities.
为你的智能体技能编写"单元测试"
Key features
- Write unit-test-like evaluations for agent skills
- Support for Claude Code, Codex, and Gemini CLI
- Catch regressions and validate skill behavior
- TypeScript-based, lightweight framework
- Designed for agent skill reliability
Use cases
- Validate custom Claude Code skills before deployment
- Run regression tests on agent skill updates
- Ensure consistent behavior across different CLI agents
- Integrate skill testing into CI/CD pipelines
- Benchmark skill performance across multiple agent platforms
README excerpt
# Skillgrade The easiest way to evaluate your [Agent Skills](https://agentskills.io/home). Tests that AI agents correctly discover and use your skills. See [examples/](examples/) — [superlint](examples/superlint/) (simple) and [angular-modern](examples/angular-modern/) (TypeScript grader).  ## Quick Start **Prerequisites**: Node.js 20+, Docker ```bash npm i -g skillgrade ``` **1. Initialize** — go to your skill directory (must have `SKILL.md`) and scaffold: ```bash cd my-skill/ GEMINI_API_KEY=your-key skillgrade init # or ANTHROPIC_API_KEY / OPENAI_API_KEY # Use --force to overwrite an existing eval.yaml ``` Generates `eval.yaml` with AI-powered tasks and graders. Without an API key, creates a well-commented template. **2. Edit** — customize `eval.yaml` for your skill (see [eval.yaml Reference](#evalyaml-reference)). **3. Run**: ```bash GEMINI_API_KEY=your-key skillgrade --smoke ``` The agent is auto-detected from your API key: `GEMINI_API_KEY` → Gemini, `ANTHROPIC_API_KEY` → Claude, `OPENAI_API_KEY` → Codex. Override with `--agent=claude`. **4. Review**: ```bash skillgrade preview # CLI report skillgrade preview browser # web UI → http://localhost:3847 ``` Reports are saved to `$TMPDIR/skillgrade/<skill-name>/results/`. Override with `--output=DIR`. ## Presets | Flag | Trials | Use Case | |------|--------|----------| | `--smoke` | 5 | Quick capability check | | `--reliable` | 15 | Reliable pass rate estimate | | `--regression` | 30 | High-confidence regression detection | ## Options | Flag | Description | |------|-------------| | `--eval=NAME[,NAME]` | Run specific evals by name (comma-separated) | | `--grader=TYPE` | Run only graders