ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference
📰 ArXiv cs.AI
ShadowNPU enables on-device LLM inference by co-designing systems and algorithms for NPU-centric processing
Action Steps
- Identify quantization sensitivity in state-of-the-art frameworks
- Develop shadowAttn to mitigate fallback to CPU/GPU
- Implement system and algorithm co-design for NPU-centric processing
- Evaluate the performance of ShadowNPU in on-device LLM inference
Who Needs to Know This
AI engineers and researchers benefit from this work as it improves the efficiency of on-device LLM inference, while product managers can leverage this technology to enhance user privacy
Key Insight
💡 Co-designing systems and algorithms for NPU-centric processing can improve the efficiency of on-device LLM inference
Share This
💡 ShadowNPU boosts on-device LLM inference with system & algorithm co-design
DeepCamp AI