APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs
📰 ArXiv cs.AI
arXiv:2604.04261v1 Announce Type: cross Abstract: Aligning large language models (LLMs) with diverse human preferences requires pluralistic alignment, where a single model must respect the values of multiple distinct groups simultaneously. In federated reinforcement learning from human feedback (FedRLHF), these groups align a shared policy without centralizing preference data, which makes fair reward aggregation essential. Existing aggregation methods exhibit clear trade offs: average based aggr
DeepCamp AI