Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

📰 ArXiv cs.AI

arXiv:2604.09813v1 Announce Type: new Abstract: Existing synthetic tool-use corpora are primarily designed for offline supervised fine-tuning, yet reinforcement learning (RL) requires executable environments that support reward-checkable online rollouts. We propose COVERT, a two-stage pipeline that first generates reliable base tool-use trajectories through self-evolving synthesis with multi-level validation, and then applies oracle-preserving augmentations that systematically increase environme

Published 14 Apr 2026

Read full paper → ← Back to Reads