Co-Evolution of Policy and Internal Reward for Language Agents

📰 ArXiv cs.AI

Self-Guide enables language agents to learn from self-generated internal rewards, improving policy and reward co-evolution

advanced Published 6 Apr 2026

Action Steps

Introduce self-generated internal rewards to language agents
Use reinforcement learning to co-evolve policy and internal reward
Evaluate the performance of language agents using sparse and delayed rewards
Analyze the impact of self-generated internal rewards on policy improvement

Who Needs to Know This

ML researchers and AI engineers can benefit from this approach to improve language agent training, as it allows for more efficient and effective learning

Key Insight

💡 Self-generated internal rewards can improve policy and reward co-evolution in language agents