Navigating the Alignment-Calibration Trade-off: A Pareto-SuperiorFrontier via Model Merging

When AI models are tuned to follow human instructions, they pay an alignment tax - losing both accuracy, diversity and causing it to halucinate confidence. Merging tuned and base models can recover both, creating smarter, more calibrated AI.

Read the full article on arXiv here.

Previous
Previous

PrivacyPAD: A Reinforcement Learning Framework for Dynamic Privacy-Aware Delegation

Next
Next

SIMBENCH: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors