pinchbench/skill 有哪些主要特性？

评估 LLM 模型作为编码代理; 基于 OpenClaw 的基准测试框架; 由 kilo.ai 团队构建; 基于 Python 的实现

pinchbench/skill 有哪些使用场景？

基准测试 LLM 编码代理性能; 比较不同 LLM 模型在编码任务中的表现; 基于代理的代码生成研究

pinchbench/skill 使用什么编程语言？

pinchbench/skill 主要使用 Python 编写。

如何安装 pinchbench/skill？

运行命令：openclaw install pinchbench/skill

Claude Skill

pinchbench/skill

PinchBench 是一个用于评估 LLM 模型作为 OpenClaw 编码代理的基准测试系统，由 kilo.ai 团队构建。

语言

概览

Stars1,260

Forks144

语言Python

最后更新2026-07-02

最近同步2026-07-03

前往 GitHub

仓库信息

拥有者pinchbench

仓库skill

完整名称pinchbench/skill

Repo ID1,154,980,232

GitHub 地址https://github.com/pinchbench/skill

安装这个 Skill

git clone https://github.com/pinchbench/skill.git

GitHub

Registry 信息

类型openclaw_skill

质量分80/100

验证状态readme_parsed

最近验证2026-06-03

平台

ClaudeOpenClaw

能力

pdfmemorysearchvideoterminal

识别文件

README.mdSKILL.mdpyproject.tomltests

配置键

PINCHBENCH_OFFICIAL_KEYKEYOPENROUTER_API_KEYKILO_API_KEYANTHROPIC_API_KEYOPENAI_API_KEY

项目简介

PinchBench 是一个用于评估 LLM 模型作为 OpenClaw 编码代理的基准测试系统，由 kilo.ai 团队开发。

英文描述

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

要点

评估 LLM 模型作为编码代理
基于 OpenClaw 的基准测试框架
由 kilo.ai 团队构建
基于 Python 的实现

使用场景

基准测试 LLM 编码代理性能
比较不同 LLM 模型在编码任务中的表现
基于代理的代码生成研究

README 摘要

# 🦀 PinchBench **Real-world benchmarks for AI coding agents** [![Leaderboard](https://img.shields.io/badge/leaderboard-pinchbench.com-blue)](https://pinchbench.com) [![License](https://img.shields.io/badge/license-MIT-green)](LICENSE) ![Tasks](https://img.shields.io/badge/tasks-53-orange) > **Note:** This repository contains the benchmark skill/tasks. It is NOT the source of official leaderboard results. To add models to the official results, modify [pinchbench/scripts/default-models.yml](https://github.com/pinchbench/scripts/blob/main/default-models.yml). PinchBench measures how well LLM models perform as the brain of an [OpenClaw](https://github.com/openclaw/openclaw) agent. Instead of synthetic tests, we throw real tasks at agents: scheduling meetings, writing code, triaging email, researching topics, and managing files. Results are collected on a public leaderboard at **[pinchbench.com](https://pinchbench.com)**. ![PinchBench](pinchbench.png) ## Why PinchBench? Most LLM benchmarks test isolated capabilities. PinchBench tests what actually matters for coding agents: - **Tool usage** — Can the model call the right tools with the right parameters? - **Multi-step reasoning** — Can it chain together actions to complete complex tasks? - **Real-world messiness** — Can it handle ambiguous instructions and incomplete information? - **Practical outcomes** — Did it actually create the file, send the email, or schedule the meeting? ## Quick Start ```bash # Clone the skill git clone https://github.com/pinchbench/skill.git cd skill # Run benchmarks with your model of choice ./scripts/run.sh --model openrouter/anthropic/claude-sonnet-4 # Or run specific tasks ./scripts/run.sh --model openrouter/openai/gpt-4o --suite t

话题

暂无话题

pinchbench/skill

概览

仓库信息

安装这个 Skill

Registry 信息

项目简介

要点

使用场景

README 摘要

话题

探索更多

相关技能

NousResearch/hermes-agent

anthropics/skills

infiniflow/ragflow

safishamsi/graphify

ComposioHQ/awesome-claude-skills

mvanhorn/last30days-skill