ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference

📰 ArXiv cs.AI

ShadowNPU enables on-device LLM inference by co-designing systems and algorithms for NPU-centric processing

advanced Published 7 Apr 2026

Action Steps

Identify quantization sensitivity in state-of-the-art frameworks
Develop shadowAttn to mitigate fallback to CPU/GPU
Implement system and algorithm co-design for NPU-centric processing
Evaluate the performance of ShadowNPU in on-device LLM inference

Who Needs to Know This

AI engineers and researchers benefit from this work as it improves the efficiency of on-device LLM inference, while product managers can leverage this technology to enhance user privacy

Key Insight

💡 Co-designing systems and algorithms for NPU-centric processing can improve the efficiency of on-device LLM inference