Claude Skill

mgechev/skillgrade

Skillgrade is a TypeScript evaluation framework for writing unit tests for AI agent skills. Supports Claude Code, Codex, and Gemini CLI to catch regressions and ensure skill reliability.

Overview

Stars540
Forks42
LanguageTypeScript
Last pushed2026-06-29
Last synced2026-07-03
View on GitHub

Repository

Ownermgechev
Repositoryskillgrade
Full namemgechev/skillgrade
Repo ID1,167,891,649

Install this Skill

npx skills add mgechev/skills-best-practices

Registry

Typecodex_skill
Quality score85/100
Verificationreadme_parsed
Last verified2026-06-10
Platforms
ClaudeCodex
Capabilities
browserpdfmemoryimagevideoterminalworkflowagentclaude-codecodex
Detected files
README.mdexamplespackage.jsontests
Config keys
GEMINI_API_KEYANTHROPIC_API_KEYOPENAI_API_KEYURLANTHROPIC_BASE_URLOPENAI_BASE_URLPACKAGE_JSON

Summary

Skillgrade is a TypeScript-based evaluation framework that lets you write "unit tests" for your AI agent skills, supporting Claude Code, Codex, and Gemini CLI. It helps you validate skill behavior, catch regressions, and ensure reliability before deploying agent capabilities.

Chinese description

为你的智能体技能编写"单元测试"

Key features

  • Write unit-test-like evaluations for agent skills
  • Support for Claude Code, Codex, and Gemini CLI
  • Catch regressions and validate skill behavior
  • TypeScript-based, lightweight framework
  • Designed for agent skill reliability

Use cases

  • Validate custom Claude Code skills before deployment
  • Run regression tests on agent skill updates
  • Ensure consistent behavior across different CLI agents
  • Integrate skill testing into CI/CD pipelines
  • Benchmark skill performance across multiple agent platforms

README excerpt

# Skillgrade The easiest way to evaluate your [Agent Skills](https://agentskills.io/home). Tests that AI agents correctly discover and use your skills. See [examples/](examples/) — [superlint](examples/superlint/) (simple) and [angular-modern](examples/angular-modern/) (TypeScript grader). ![Browser Preview](https://raw.githubusercontent.com/mgechev/skillgrade/main/assets/browser-preview.png) ## Quick Start **Prerequisites**: Node.js 20+, Docker ```bash npm i -g skillgrade ``` **1. Initialize** — go to your skill directory (must have `SKILL.md`) and scaffold: ```bash cd my-skill/ GEMINI_API_KEY=your-key skillgrade init # or ANTHROPIC_API_KEY / OPENAI_API_KEY # Use --force to overwrite an existing eval.yaml ``` Generates `eval.yaml` with AI-powered tasks and graders. Without an API key, creates a well-commented template. **2. Edit** — customize `eval.yaml` for your skill (see [eval.yaml Reference](#evalyaml-reference)). **3. Run**: ```bash GEMINI_API_KEY=your-key skillgrade --smoke ``` The agent is auto-detected from your API key: `GEMINI_API_KEY` → Gemini, `ANTHROPIC_API_KEY` → Claude, `OPENAI_API_KEY` → Codex. Override with `--agent=claude`. **4. Review**: ```bash skillgrade preview # CLI report skillgrade preview browser # web UI → http://localhost:3847 ``` Reports are saved to `$TMPDIR/skillgrade/<skill-name>/results/`. Override with `--output=DIR`. ## Presets | Flag | Trials | Use Case | |------|--------|----------| | `--smoke` | 5 | Quick capability check | | `--reliable` | 15 | Reliable pass rate estimate | | `--regression` | 30 | High-confidence regression detection | ## Options | Flag | Description | |------|-------------| | `--eval=NAME[,NAME]` | Run specific evals by name (comma-separated) | | `--grader=TYPE` | Run only graders

Topics

Explore more

Data from GitHub. Synced on 2026-07-03