PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning

📰 ArXiv cs.AI

arXiv:2604.12652v1 Announce Type: cross Abstract: Reinforcement learning (RL) can improve the prompt following capability of text-to-image (T2I) models, yet obtaining high-quality reward signals remains challenging: CLIP Score is too coarse-grained, while VLM-based reward models (e.g., RewardDance) require costly human-annotated preference data and additional fine-tuning. We propose PromptEcho, a reward construction method that requires \emph{no} annotation and \emph{no} reward model training. G

Published 15 Apr 2026

Read full paper → ← Back to Reads