SIMBENCH: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
SimBench sets a new standard for evaluating AI as a mirror of human behaviours, uniting 20 diverse datasets to reveal when model simulations succeed, fail, and why that matters.
See the full article here on arXiv.