UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation
Large language models often sound confident, even when wrong. This study benchmarks how they express uncertainty, helping researchers design models that reason, and admit doubt more like people do.
Read the full article here on arXiv.