WeianMao/triattention 有哪些主要特性？

三角键值缓存压缩，降低内存占用; 在内存受限的 GPU 上实现长上下文推理; 支持 OpenClaw 模型的本地部署; 针对有限硬件资源优化高效推理

WeianMao/triattention 有哪些使用场景？

在消费级 GPU 上本地运行大型语言模型; 长文档分析与摘要; 面向边缘设备的内存高效 AI 推理

WeianMao/triattention 使用什么编程语言？

WeianMao/triattention 主要使用 Python 编写。

如何安装 WeianMao/triattention？

运行命令：openclaw install WeianMao/triattention

Claude Skill

WeianMao/triattention

TriAttention 通过三角键值缓存压缩，在内存受限的 GPU 上实现高效长推理与 OpenClaw 本地部署。

语言

概览

Stars811

Forks77

语言Python

最后更新2026-07-02

最近同步2026-07-03

前往 GitHub

仓库信息

拥有者WeianMao

仓库triattention

完整名称WeianMao/triattention

Repo ID1,200,960,609

GitHub 地址https://github.com/WeianMao/triattention

安装这个 Skill

git clone https://github.com/WeianMao/triattention.git

GitHub

Registry 信息

类型openclaw_skill

质量分75/100

验证状态readme_parsed

最近验证2026-06-06

平台

OpenClaw

能力

pdfmemoryimagevideoterminal

识别文件

README.mddocsrequirements.txt

安装方式

git clone https://github.com/WeianMao/triattention.git
pip install -e .
pip install flash-attn --no-build-isolation # recommended (takes 105m in DGX Spark / GB10)

项目简介

TriAttention 是一种高效的长推理技术，通过三角键值缓存压缩降低内存占用，支持在内存受限的 GPU 上本地部署 OpenClaw 等大型模型。

英文描述

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

要点

三角键值缓存压缩，降低内存占用
在内存受限的 GPU 上实现长上下文推理
支持 OpenClaw 模型的本地部署
针对有限硬件资源优化高效推理

使用场景

在消费级 GPU 上本地运行大型语言模型
长文档分析与摘要
面向边缘设备的内存高效 AI 推理

README 摘要

<div align="center"> # TriAttention: Efficient Long Reasoning with Trigonometric KV Compression [![Paper](https://img.shields.io/badge/ArXiv-Paper-brown)](https://arxiv.org/abs/2604.04921) [![Project Page](https://img.shields.io/badge/Project-Page-teal)](https://weianmao.github.io/tri-attention-project-page/) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) [![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-green.svg)](https://www.python.org/downloads/) *Compress KV cache by 10.7x and boost throughput by 2.5x on long reasoning tasks -- with no accuracy loss.* [Weian Mao](https://scholar.google.com/citations?user=Qu-QXTsAAAAJ)1*, [Xi Lin](https://profile.erix025.me/)3*, [Wei Huang](https://aaron-weihuang.com/)2*, Yuxin Xie1, Tianfu Fu1, [Bohan Zhuang](https://bohanzhuang.github.io)3, [Song Han](http://songhan.mit.edu/)1,2, [Yukang Chen](https://yukangchen.com/)2 1MIT, 2NVIDIA, 3ZJU    *Equal contribution </div> https://github.com/user-attachments/assets/768e59bb-897e-41bf-81b8-e7376aa72056 ## News - **[2026-04-21]** SGLang backend support added — TriAttention now runs on SGLang in addition to vLLM. See [SGLang Integration](docs/sglang.md). - **[2026-04-14]** Community DGX Spark (GB10/sm-121) enablement by [@dscain](https://github.com/dscain) — vLLM support merged, non-vLLM path in progress. - **[2026-04-12]** TriAttention now supports AR video generation with KV cache compression. See [LongLive README](longlive/README.md). - **[2026-04-11]** Community C/ggml port for llama.cpp (HIP/ROCm) by [@domvox](https://github.com/domvox) — enables TriAttention on AMD GPUs via llama.c

话题

暂无话题

WeianMao/triattention

概览

仓库信息

安装这个 Skill

Registry 信息

项目简介

要点

使用场景

README 摘要

话题

探索更多

相关技能

NousResearch/hermes-agent

anthropics/skills

infiniflow/ragflow

safishamsi/graphify

ComposioHQ/awesome-claude-skills

mvanhorn/last30days-skill