Topics: llm-evaluation

Browse Claude Skill projects under the "llm-evaluation" topic.

Language

Tencent/AI-Infra-Guard

A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan and LLM jailbreak evaluation.

⭐ 3918🍴 379Python

agent agent-security ai-infra

darkrishabh/agent-skills-eval

A test runner for agentskills.io-style AI agent skills

⭐ 596🍴 30TypeScript

agent-evals agent-skills agentskills

LeoYeAI/myclaw-bench

The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai

⭐ 232🍴 39Python

agent-testing ai-agent ai-benchmark

Showing 3/3