A Taxonomy of Multi-Objective Alignment Techniques for Large Language Models

Aligning large language models (LLMs) with human preferences has evolved from single-objective reward maximization to sophisticated multi-objective optimization. Real-world deployment requires balancing competing objectiveshelpfulness, harmlessness, honesty, instruction-following, and task-specic capabilitiesthat often conict. This survey provides a systematic taxonomy of multi-objective alignment techniques, organizing the rapidly growing literature into four categories: (1) Reward Decomposition approaches that … Read more

Constraint Decomposition for Multi-Objective Instruction-Following in Large Language Models

Large language models (LLMs) trained with reinforcement learning from human feed- back (RLHF) struggle with complex instructions that bundle multiple, potentially con- icting requirements. We introduce constraint decomposition, a framework that separates multi-objective instructions into orthogonal componentssemantic correctness, structural organization, format specications, and meta-level requirementsand optimizes each in- dependently before hierarchical combination. Our approach addresses … Read more