SIMBENCH: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

11 Nov

SimBench sets a new standard for evaluating AI as a mirror of human behaviours, uniting 20 diverse datasets to reveal when model simulations succeed, fail, and why that matters.