Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning
📰 ArXiv cs.AI
arXiv:2604.09813v1 Announce Type: new Abstract: Existing synthetic tool-use corpora are primarily designed for offline supervised fine-tuning, yet reinforcement learning (RL) requires executable environments that support reward-checkable online rollouts. We propose COVERT, a two-stage pipeline that first generates reliable base tool-use trajectories through self-evolving synthesis with multi-level validation, and then applies oracle-preserving augmentations that systematically increase environme
DeepCamp AI