Claude Skill

WeianMao/triattention

TriAttention 通过三角键值缓存压缩,在内存受限的 GPU 上实现高效长推理与 OpenClaw 本地部署。

概览

Stars811
Forks77
语言Python
最后更新2026-07-02
最近同步2026-07-03
前往 GitHub

仓库信息

拥有者WeianMao
仓库triattention
完整名称WeianMao/triattention
Repo ID1,200,960,609

安装这个 Skill

git clone https://github.com/WeianMao/triattention.git

Registry 信息

类型openclaw_skill
质量分75/100
验证状态readme_parsed
最近验证2026-06-06
平台
OpenClaw
能力
pdfmemoryimagevideoterminal
识别文件
README.mddocsrequirements.txt
安装方式
  • git clone https://github.com/WeianMao/triattention.git
  • pip install -e .
  • pip install flash-attn --no-build-isolation # recommended (takes 105m in DGX Spark / GB10)

项目简介

TriAttention 是一种高效的长推理技术,通过三角键值缓存压缩降低内存占用,支持在内存受限的 GPU 上本地部署 OpenClaw 等大型模型。

英文描述

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

要点

  • 三角键值缓存压缩,降低内存占用
  • 在内存受限的 GPU 上实现长上下文推理
  • 支持 OpenClaw 模型的本地部署
  • 针对有限硬件资源优化高效推理

使用场景

  • 在消费级 GPU 上本地运行大型语言模型
  • 长文档分析与摘要
  • 面向边缘设备的内存高效 AI 推理

README 摘要

<div align="center"> # TriAttention: Efficient Long Reasoning with Trigonometric KV Compression [![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2604.04921) [![Project Page](https://img.shields.io/badge/Project-Page-teal)](https://weianmao.github.io/tri-attention-project-page/) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) [![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-green.svg)](https://www.python.org/downloads/) *Compress KV cache by 10.7x and boost throughput by 2.5x on long reasoning tasks -- with no accuracy loss.* [Weian Mao](https://scholar.google.com/citations?user=Qu-QXTsAAAAJ)<sup>1*</sup>, [Xi Lin](https://profile.erix025.me/)<sup>3*</sup>, [Wei Huang](https://aaron-weihuang.com/)<sup>2*</sup>, Yuxin Xie<sup>1</sup>, Tianfu Fu<sup>1</sup>, [Bohan Zhuang](https://bohanzhuang.github.io)<sup>3</sup>, [Song Han](http://songhan.mit.edu/)<sup>1,2</sup>, [Yukang Chen](https://yukangchen.com/)<sup>2</sup> <sup>1</sup>MIT, <sup>2</sup>NVIDIA, <sup>3</sup>ZJU &nbsp;&nbsp; <sup>*</sup>Equal contribution </div> https://github.com/user-attachments/assets/768e59bb-897e-41bf-81b8-e7376aa72056 ## News - **[2026-04-21]** SGLang backend support added — TriAttention now runs on SGLang in addition to vLLM. See [SGLang Integration](docs/sglang.md). - **[2026-04-14]** Community DGX Spark (GB10/sm-121) enablement by [@dscain](https://github.com/dscain) — vLLM support merged, non-vLLM path in progress. - **[2026-04-12]** TriAttention now supports AR video generation with KV cache compression. See [LongLive README](longlive/README.md). - **[2026-04-11]** Community C/ggml port for llama.cpp (HIP/ROCm) by [@domvox](https://github.com/domvox) — enables TriAttention on AMD GPUs via llama.c

话题

暂无话题

探索更多

数据来自 GitHub,同步时间:2026-07-03