What are the key features of darkrishabh/agent-skills-eval?

Runs agent skill evaluations from agentskills.io-style definitions; Supports JSONL and YAML test file formats; OpenAI-compatible LLM evaluation integration; Command-line interface (CLI) for easy automation; Built with TypeScript for type safety and reliability

What are the use cases of darkrishabh/agent-skills-eval?

Evaluating AI agent performance on standardized skill tests; Automating LLM evaluation pipelines in CI/CD workflows; Benchmarking different AI agents against agentskills.io tasks; Developing and testing new agent skills with reproducible evals

What programming language does darkrishabh/agent-skills-eval use?

darkrishabh/agent-skills-eval is primarily written in TypeScript.

How to install darkrishabh/agent-skills-eval?

Run: openclaw install darkrishabh/agent-skills-eval

Claude Skill

darkrishabh/agent-skills-eval

A TypeScript CLI tool for evaluating AI agent skills in agentskills.io format. Supports JSONL/YAML tests and OpenAI-compatible LLM evals.

Language

Overview

Stars608

Forks30

LanguageTypeScript

Last pushed2026-07-01

Last synced2026-07-03

View on GitHub

Repository

Ownerdarkrishabh

Repositoryagent-skills-eval

Full namedarkrishabh/agent-skills-eval

Repo ID1,230,541,272

GitHub URLhttps://github.com/darkrishabh/agent-skills-eval

Install this Skill

npx agent-skills-eval ./skills \

GitHub

Registry

Typeworkflow

Quality score85/100

Verificationreadme_parsed

Last verified2026-06-08

Platforms

Claude

Capabilities

pdfterminalworkflowagent-evalsagent-skillsagentskillsai-agentsclijsonlllm-evals

Detected files

README.mddocsexamplespackage.jsontest

Config keys

OPENAI_API_KEYPACKAGE_JSON

Install methods

npx agent-skills-eval ./skills \
npm install agent-skills-eval
npx agent-skills-eval --help
npx agent-skills-eval [root] \

Summary

A TypeScript-based test runner for evaluating AI agent skills in the agentskills.io format. It supports CLI usage, JSONL and YAML test definitions, and OpenAI-compatible LLM evaluations.

Chinese description

agentskills.io 风格 AI 代理技能的测试运行器

Key features

Runs agent skill evaluations from agentskills.io-style definitions
Supports JSONL and YAML test file formats
OpenAI-compatible LLM evaluation integration
Command-line interface (CLI) for easy automation
Built with TypeScript for type safety and reliability

Use cases

Evaluating AI agent performance on standardized skill tests
Automating LLM evaluation pipelines in CI/CD workflows
Benchmarking different AI agents against agentskills.io tasks
Developing and testing new agent skills with reproducible evals

README excerpt

<div align="center"> <img src="https://github.com/user-attachments/assets/094b8e11-e19e-4c96-ae82-ba701cfcf7e3" alt="agent-skills-eval — a test runner for Agent Skills" width="100%" /> <br /> # agent-skills-eval **A test runner for [Agent Skills](https://agentskills.io).** Write a `SKILL.md`, drop in some evals, and find out — empirically — whether your skill actually makes the model better at the task. [![npm version](https://img.shields.io/npm/v/agent-skills-eval.svg?style=flat-square&logo=npm&label=npm)](https://www.npmjs.com/package/agent-skills-eval) [![CI](https://img.shields.io/github/actions/workflow/status/darkrishabh/agent-skills-eval/ci.yml?style=flat-square&logo=github&label=ci)](https://github.com/darkrishabh/agent-skills-eval/actions/workflows/ci.yml) [![license: MIT](https://img.shields.io/badge/license-MIT-green?style=flat-square)](LICENSE) [![node](https://img.shields.io/node/v/agent-skills-eval.svg?style=flat-square&logo=nodedotjs&logoColor=white)](package.json) [![docs](https://img.shields.io/badge/docs-GitHub%20Pages-0f766e?style=flat-square)](https://darkrishabh.github.io/agent-skills-eval/) [![TypeScript](https://img.shields.io/badge/TypeScript-3178C6?style=flat-square&logo=typescript&logoColor=white)](https://www.typescriptlang.org/) [Documentation](https://darkrishabh.github.io/agent-skills-eval/) · [Quickstart](#quickstart) · [SDK](#sdk) · [agentskills.io](https://agentskills.io) </div> --- ## Why this exists [Agent Skills](https://agentskills.io) — the open standard from Anthropic for giving agents domain knowledge — make it easy to ship a `SKILL.md` and assume your agent is now better at the task. The hard part is *proving* it. `agent-skills-eval` is the missing piece. It runs your skill against the same prompts twice — once `with_

Topics

agent-evals agent-skills agentskills ai-agents cli jsonl llm-evals llm-evaluation openai-compatible typescript yaml

darkrishabh/agent-skills-eval

Overview

Repository

Install this Skill

Registry

Summary

Key features

Use cases

README excerpt

Topics

Explore more

Related skills

nexu-io/open-design

NVIDIA/NemoClaw

heilcheng/awesome-agent-skills

builderz-labs/mission-control

onecli/onecli

ComposioHQ/awesome-claude-skills