Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models

📰 ArXiv cs.AI

arXiv:2505.12509v3 Announce Type: replace-cross Abstract: Post-hoc explanations provide transparency and are essential for guiding model optimization, such as prompt engineering and data sanitation. However, applying model-agnostic techniques to Large Language Models (LLMs) is hindered by prohibitive computational costs, rendering these tools dormant for real-world applications. To revitalize model-agnostic interpretability, we propose a budget-friendly proxy framework that leverages efficient m

Published 13 Apr 2026

Read full paper → ← Back to Reads