Beyond the final layer: Intermediate representations for better multilingual calibration in large language models

This paper tackles the blind-spot of confidence calibration in multilingual large language models: it shows that non-English languages are far worse calibrated than English, and finds that intermediate layers, not the final layer, offer much better confidence signals. Building on this, we introduce Language-Aware Confidence Ensemble (LACE), a training-free method that adaptively selects the best layers per language.

Read the full article here on arXiv.

Previous
Previous

Trident: Benchmarking llm safety in finance, medicine, and law

Next
Next

PrivacyPAD: A Reinforcement Learning Framework for Dynamic Privacy-Aware Delegation