APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs

📰 ArXiv cs.AI

arXiv:2604.04261v1 Announce Type: cross Abstract: Aligning large language models (LLMs) with diverse human preferences requires pluralistic alignment, where a single model must respect the values of multiple distinct groups simultaneously. In federated reinforcement learning from human feedback (FedRLHF), these groups align a shared policy without centralizing preference data, which makes fair reward aggregation essential. Existing aggregation methods exhibit clear trade offs: average based aggr

Published 7 Apr 2026
Read full paper → ← Back to News