Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation

📰 ArXiv cs.AI

arXiv:2506.21138v2 Announce Type: replace-cross Abstract: High-quality labeled datasets are fundamental for training and evaluating machine learning models, yet domains such as healthcare and Requirements Engineering (RE) face persistent barriers due to data scarcity, privacy constraints, or proprietary restrictions. While Large Language Models (LLMs) offer a promising avenue for Synthetic Data Generation (SDG), LLM-generated data tends to be repetitive and low in diversity, reducing its effecti

Published 31 Mar 2026

Read full paper → ← Back to News