What are the key features of pinchbench/skill?

Evaluates LLM models as coding agents; OpenClaw-based benchmarking framework; Built by the team at kilo.ai; Python-based implementation

What are the use cases of pinchbench/skill?

Benchmarking LLM coding agent performance; Comparing different LLM models for coding tasks; Research on agent-based code generation

What programming language does pinchbench/skill use?

pinchbench/skill is primarily written in Python.

How to install pinchbench/skill?

Run: openclaw install pinchbench/skill

Claude Skill

pinchbench/skill

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Built by the team at kilo.ai.

Language

Overview

Stars1,260

Forks144

LanguagePython

Last pushed2026-07-02

Last synced2026-07-03

View on GitHub

Repository

Ownerpinchbench

Repositoryskill

Full namepinchbench/skill

Repo ID1,154,980,232

GitHub URLhttps://github.com/pinchbench/skill

Install this Skill

git clone https://github.com/pinchbench/skill.git

GitHub

Registry

Typeopenclaw_skill

Quality score80/100

Verificationreadme_parsed

Last verified2026-06-03

Platforms

ClaudeOpenClaw

Capabilities

pdfmemorysearchvideoterminal

Detected files

README.mdSKILL.mdpyproject.tomltests

Config keys

PINCHBENCH_OFFICIAL_KEYKEYOPENROUTER_API_KEYKILO_API_KEYANTHROPIC_API_KEYOPENAI_API_KEY

Summary

PinchBench is a benchmarking system designed to evaluate LLM models as OpenClaw coding agents, built by the team at kilo.ai.

Chinese description

PinchBench是一个用于评估LLM模型作为OpenClaw编码代理的基准测试系统。由https://kilo.ai的人类团队用🦀制作。

Key features

Evaluates LLM models as coding agents
OpenClaw-based benchmarking framework
Built by the team at kilo.ai
Python-based implementation

Use cases

Benchmarking LLM coding agent performance
Comparing different LLM models for coding tasks
Research on agent-based code generation

README excerpt

# 🦀 PinchBench **Real-world benchmarks for AI coding agents** [![Leaderboard](https://img.shields.io/badge/leaderboard-pinchbench.com-blue)](https://pinchbench.com) [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE) ![Tasks](https://img.shields.io/badge/tasks-53-orange) > **Note:** This repository contains the benchmark skill/tasks. It is NOT the source of official leaderboard results. To add models to the official results, modify [pinchbench/scripts/default-models.yml](https://github.com/pinchbench/scripts/blob/main/default-models.yml). PinchBench measures how well LLM models perform as the brain of an [OpenClaw](https://github.com/openclaw/openclaw) agent. Instead of synthetic tests, we throw real tasks at agents: scheduling meetings, writing code, triaging email, researching topics, and managing files. Results are collected on a public leaderboard at **[pinchbench.com](https://pinchbench.com)**. ![PinchBench](pinchbench.png) ## Why PinchBench? Most LLM benchmarks test isolated capabilities. PinchBench tests what actually matters for coding agents: - **Tool usage** — Can the model call the right tools with the right parameters? - **Multi-step reasoning** — Can it chain together actions to complete complex tasks? - **Real-world messiness** — Can it handle ambiguous instructions and incomplete information? - **Practical outcomes** — Did it actually create the file, send the email, or schedule the meeting? ## Quick Start ```bash # Clone the skill git clone https://github.com/pinchbench/skill.git cd skill # Run benchmarks with your model of choice ./scripts/run.sh --model openrouter/anthropic/claude-sonnet-4 # Or run specific tasks ./scripts/run.sh --model openrouter/openai/gpt-4o --suite t

Topics

No topics yet.

pinchbench/skill

Overview

Repository

Install this Skill

Registry

Summary

Key features

Use cases

README excerpt

Topics

Explore more

Related skills

NousResearch/hermes-agent

anthropics/skills

infiniflow/ragflow

safishamsi/graphify

ComposioHQ/awesome-claude-skills

mvanhorn/last30days-skill