Response-Aware User Memory Selection for LLM Personalization

📰 ArXiv cs.AI

arXiv:2604.14473v1 Announce Type: new Abstract: A common approach to personalization in large language models (LLMs) is to incorporate a subset of the user memory into the prompt at inference time to guide the model's generation. Existing methods select these subsets primarily using similarity between user memory items and input queries, ignoring how features actually affect the model's response distribution. We propose Response-Utility optimization for Memory Selection (RUMS), a novel method th

Published 17 Apr 2026

Read full paper → ← Back to Reads