What are the key features of mgechev/skillgrade?

Write unit-test-like evaluations for agent skills; Support for Claude Code, Codex, and Gemini CLI; Catch regressions and validate skill behavior; TypeScript-based, lightweight framework; Designed for agent skill reliability

What are the use cases of mgechev/skillgrade?

Validate custom Claude Code skills before deployment; Run regression tests on agent skill updates; Ensure consistent behavior across different CLI agents; Integrate skill testing into CI/CD pipelines; Benchmark skill performance across multiple agent platforms

What programming language does mgechev/skillgrade use?

mgechev/skillgrade is primarily written in TypeScript.

How to install mgechev/skillgrade?

Run: openclaw install mgechev/skillgrade

Claude Skill

mgechev/skillgrade

Skillgrade is a TypeScript evaluation framework for writing unit tests for AI agent skills. Supports Claude Code, Codex, and Gemini CLI to catch regressions and ensure skill reliability.

Language

Overview

Stars540

Forks42

LanguageTypeScript

Last pushed2026-06-29

Last synced2026-07-03

View on GitHub

Repository

Ownermgechev

Repositoryskillgrade

Full namemgechev/skillgrade

Repo ID1,167,891,649

GitHub URLhttps://github.com/mgechev/skillgrade

Install this Skill

npx skills add mgechev/skills-best-practices

GitHub

Registry

Typecodex_skill

Quality score85/100

Verificationreadme_parsed

Last verified2026-06-10

Platforms

ClaudeCodex

Capabilities

browserpdfmemoryimagevideoterminalworkflowagentclaude-codecodex

Detected files

README.mdexamplespackage.jsontests

Config keys

GEMINI_API_KEYANTHROPIC_API_KEYOPENAI_API_KEYURLANTHROPIC_BASE_URLOPENAI_BASE_URLPACKAGE_JSON

Summary

Skillgrade is a TypeScript-based evaluation framework that lets you write "unit tests" for your AI agent skills, supporting Claude Code, Codex, and Gemini CLI. It helps you validate skill behavior, catch regressions, and ensure reliability before deploying agent capabilities.

Chinese description

为你的智能体技能编写"单元测试"

Key features

Write unit-test-like evaluations for agent skills
Support for Claude Code, Codex, and Gemini CLI
Catch regressions and validate skill behavior
TypeScript-based, lightweight framework
Designed for agent skill reliability

Use cases

Validate custom Claude Code skills before deployment
Run regression tests on agent skill updates
Ensure consistent behavior across different CLI agents
Integrate skill testing into CI/CD pipelines
Benchmark skill performance across multiple agent platforms

README excerpt

# Skillgrade The easiest way to evaluate your [Agent Skills](https://agentskills.io/home). Tests that AI agents correctly discover and use your skills. See [examples/](examples/) — [superlint](examples/superlint/) (simple) and [angular-modern](examples/angular-modern/) (TypeScript grader). ![Browser Preview](https://raw.githubusercontent.com/mgechev/skillgrade/main/assets/browser-preview.png) ## Quick Start **Prerequisites**: Node.js 20+, Docker ```bash npm i -g skillgrade ``` **1. Initialize** — go to your skill directory (must have `SKILL.md`) and scaffold: ```bash cd my-skill/ GEMINI_API_KEY=your-key skillgrade init # or ANTHROPIC_API_KEY / OPENAI_API_KEY # Use --force to overwrite an existing eval.yaml ``` Generates `eval.yaml` with AI-powered tasks and graders. Without an API key, creates a well-commented template. **2. Edit** — customize `eval.yaml` for your skill (see [eval.yaml Reference](#evalyaml-reference)). **3. Run**: ```bash GEMINI_API_KEY=your-key skillgrade --smoke ``` The agent is auto-detected from your API key: `GEMINI_API_KEY` → Gemini, `ANTHROPIC_API_KEY` → Claude, `OPENAI_API_KEY` → Codex. Override with `--agent=claude`. **4. Review**: ```bash skillgrade preview # CLI report skillgrade preview browser # web UI → http://localhost:3847 ``` Reports are saved to `$TMPDIR/skillgrade/<skill-name>/results/`. Override with `--output=DIR`. ## Presets | Flag | Trials | Use Case | |------|--------|----------| | `--smoke` | 5 | Quick capability check | | `--reliable` | 15 | Reliable pass rate estimate | | `--regression` | 30 | High-confidence regression detection | ## Options | Flag | Description | |------|-------------| | `--eval=NAME[,NAME]` | Run specific evals by name (comma-separated) | | `--grader=TYPE` | Run only graders

Topics

agent claude-code codex eval gemini-cli skill

mgechev/skillgrade

Overview

Repository

Install this Skill

Registry

Summary

Key features

Use cases

README excerpt

Topics

Explore more

Related skills

ComposioHQ/awesome-claude-skills

iOfficeAI/AionUi

gotalab/cc-sdd

op7418/guizang-ppt-skill

iOfficeAI/OfficeCLI

junhoyeo/tokscale