UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation

10 Nov

Large language models often sound confident, even when wrong. This study benchmarks how they express uncertainty, helping researchers design models that reason, and admit doubt more like people do.