Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models
📰 ArXiv cs.AI
arXiv:2505.12509v3 Announce Type: replace-cross Abstract: Post-hoc explanations provide transparency and are essential for guiding model optimization, such as prompt engineering and data sanitation. However, applying model-agnostic techniques to Large Language Models (LLMs) is hindered by prohibitive computational costs, rendering these tools dormant for real-world applications. To revitalize model-agnostic interpretability, we propose a budget-friendly proxy framework that leverages efficient m
DeepCamp AI