Claude Skill
WeianMao/triattention
TriAttention uses trigonometric KV cache compression to enable efficient long reasoning and local deployment of OpenClaw on memory-constrained GPUs.
Overview
Repository
🚀 Install this Skill
openclaw install WeianMao/triattentionSummary
TriAttention is an efficient long-reasoning technique that uses trigonometric KV cache compression to reduce memory usage, enabling local deployment of large models like OpenClaw on memory-constrained GPUs.
TriAttention — 通过三角键值缓存压缩实现高效长推理。支持在内存受限的GPU上本地部署OpenClaw。
Key features
- Trigonometric KV cache compression for reduced memory footprint
- Enables long-context reasoning on memory-constrained GPUs
- Supports local deployment of OpenClaw models
- Optimized for efficient inference with limited hardware
Use cases
- Running large language models locally on consumer GPUs
- Long-document analysis and summarization
- Memory-efficient AI reasoning for edge devices
Topics
No topics yet.