A smiling man with short brown hair and wearing a black shirt and backpack at an airport terminal.

Nigel H. Collier

I’m Professor of Natural Language Processing at the University of Cambridge and Chief Scientist of Trismik. My work focuses on creating better AI systems with an emphasis on uncertainty, personalization, and adversarial assessment of model behaviour.

Reflections

Occasional opinion articles exploring humanistic perspectives on AI.

Featured

Nov 25, 2025

Two Boundaries, Two Worlds

Nov 25, 2025

LLMs are starting to pass bar exams and navigate legal categories, but still fail basic physical reasoning tasks. They’re fluent about fiat boundaries - laws, roles, jurisdictions - but can still fumble bona fide ones like surfaces and collisions. As hybrid world models emerge, we’re discovering just how far language alone can, and can’t, take intelligence.

Nov 25, 2025

Nov 18, 2025

Are Doubt and Uncertainty the Same Thing?

Nov 18, 2025

The article argues that LLMs display uncertainty but cannot yet experience doubt, a richer metacognitive process. Because models lack self-awareness of their own ignorance, they can appear cautious while still hallucinating, potentially creating issues in settings where genuine epistemic responsibility matters.

Nov 18, 2025

Nov 13, 2025

The Discipline of Wonder

Nov 13, 2025

A meditation on my AI research journey, from early Hopfield networks to modern language models, exploring how meaning emerges from pattern rather than rules, and why scientific curiosity sometimes demands disciplined wonder.

Nov 13, 2025

Nov 10, 2025

Beyond Alignment: Toward a Sunao Intelligence

Nov 10, 2025

Inspired by Konosuke Matsushita’s idea of the Sunao mind - open, sincere, and unbound by rigid patterns - this essay explores how AI might move beyond obedience toward genuine attunement. A Sunao intelligence would not just follow instructions but perceive intention, reflect with humility, and act with sincerity.

Nov 10, 2025

I hope you enjoy these essays and please share your reflections on LinkedIn. I try to read as many comments as I can although I may not be able to respond personally. Thoughtful, constructive, and considerate perspectives are always appreciated.

Spotlight

Short notes on the latest published works from our team at Cambridge and partners

Google Scholar

A woman with short wavy hair, sitting with her eyes closed and her hand on her forehead, appears distressed, with swirling clouds in the background.

UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation

Large language models often sound confident—even when wrong. This study benchmarks how they express uncertainty, helping researchers design models that reason, and admit doubt more like people do.

Read on arXiv

A painting of a person made up of colorful brushstrokes, with a network of glowing nodes and lines on the left side against a textured bluish background.

SIMBENCH: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

SimBench sets a new standard for evaluating AI as a mirror of human behaviours, uniting 20 diverse datasets to reveal when model simulations succeed, fail, and why that matters.

Read on arXiv

A colorful abstract painting featuring two women, one with a child, and a heart-shaped maze-like pattern in the center, with predominantly yellow, orange, blue, and green hues.

Conformity in Large Language Models

This study exposes how AI models often ‘follow the crowd,’ mirroring social conformity. Understanding and correcting this helps build systems that think independently and better reflect human diversity.

Read on ACL

500xCompressor: Generalized Prompt Compression for Large Language Models

This work demonstrates a major gain in large-language-model efficiency: prompts can be compressed up to 500× with minimal accuracy loss, increasing speed and efficiency without retraining.

Read on ACL

Silhouettes of ancient warriors engaging in combat, holding spears and shields, against a textured, earthy background.

Generative Language Models Exhibit Social Identity Biases

LLMs often reflect ‘us vs them’ biases ingrained in human data favoring in-group members and dismissing others. Recognising and curbing these tendencies is vital to build fair, inclusive AI systems.

Read on arXiv

A judge in a black robe sitting at a wooden desk in a courtroom, holding a gavel, facing a large floating face of a woman with closed eyes surrounded by smoke in the background.

Trident: Benchmarking LLM Safety in Finance, Medicine, and Law

Trident explores how safely large language models operate in finance, medicine, and law, revealing high stakes domains where today’s AI still falters, and helping society build more trustworthy systems for critical decisions.

Read on arXiv

Research

Published articles and pre-prints from our team and partners

Featured

Nov 23, 2025

On Reality and the Limits of Language Data: Aligning LLMs with Human Norms

Nov 23, 2025

Now for historical interest: our 2023 study found that language-trained AI struggled with real-world common-sense reasoning. New 2024-25 benchmarks confirm that even multimodal models still falter on spatial physical tasks and object affordances. Ground-truth world modelling remains a frontier but aligning AI with human-scale embodied knowledge is still vital for safe applications.

Nov 23, 2025

Nov 20, 2025

LoGU: Long-form Generation with Uncertainty Expressions

Nov 20, 2025

This paper studies how to reduce hallucinations when large language models generate long answers with multiple claims. We propose Long-form Generation with Uncertainty, where models explicitly mark uncertain parts of their responses. Using new training data, supervised fine-tuning, and direct preference optimization, we improve factual accuracy while keeping explanations detailed, readable, and clear about knowledge gaps.

Nov 20, 2025

Nov 16, 2025

Time to Revisit Exact Match

Nov 16, 2025

Large language models sometimes struggle with temporal understanding, yet traditional “exact match” metrics hide these errors or mis-rank systems. This paper introduces better numeric measures that capture how wrong a model is - improving our understanding of model limitations and preventing misplaced trust in real-world use.

Nov 16, 2025

Nov 12, 2025

All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning

Nov 12, 2025

How can we be confident large language models are confident for the right reasons? Our EMNLP 2025 paper introduces training-free, graph-based confidence estimation for reasoning tasks, modeling LLM thought paths as directed graphs using centrality and convergence to improve reliability, interpretability, and downstream performance.

Nov 12, 2025

Trident: Benchmarking llm safety in finance, medicine, and law

Nov 12, 2025

As AI models enter high-stakes domains such as law, finance and healthcare, this work references clear safety principles drawn from professional ethics and introduces Trident-Bench, a new benchmark to test how well large language models adhere to them. We evaluate 19 models and find that while strong generalists (e.g., GPT, Gemini) pass basic checks, domain-specialist models often fail to comply with policies, underlining the urgent need for targeted safety evaluations.

Nov 12, 2025

Meet the Team

Yinhong Liu

Evaluation, Calibration, and Alignment
Meiru Zhang

D4, Event Extraction and Forecasting
Tiancheng Hu

D3, Personalization and Social Simulations
Chang Shu

D3, Reasoning and Alignment
Caiqi Zhang

D3, Uncertainty and Factuality
Yinjiang River Dong

D2, Personality and Personalization
Zongqian Li

D2, Efficiency and Multimodality
Sanhanat Sivapiromrat

D2, Safety and Alignment
Auss Abbood

D2, Digital Disease Surveillance
Paul Martin

D2, Modular and Efficient Deep Learning
Zack Hui

D1, Safety and Alignment
Ehsan Shareghi

Affiliated Lecturer
Zaiqiao Meng

Affiliated Lecturer
Zihao Fu

Affiliated Lecturer

Group of smiling people standing in front of the Universal Studios globe at the entrance of the theme park.

Our team is part of the Language Technology Lab (LTL) at Cambridge. LTL investigates computation, cognition, and language through technically rigorous, experiment-based NLP research. We value intellectual curiosity, collaboration, and precision and welcome applicants ready to engage deeply with challenging ideas.

Prospective PhD students: I am always interested to supervise new NLP projects on the PhD in Computation, Cognition and Language. Before contacting me please make sure that you meet the minimum requirements and take time to check out my publications. The work we do in my team is technical and experiment-based so please apply only if you have strong programming skills. In your email please send a CV with a brief statement of research interests. Please note the application deadline and documents you need to submit with your application. For 2026 applicants: I will accept 2 or 3 PhDs in 2026, and also the MPhil by Research in language sciences offers places to applicants with an NLP background. This can be a great springboard to PhD research.

Bio

A man in a suit speaking into a microphone at an event or conference.

I have been working in NLP and AI for over 30 years. Before joining the University of Cambridge on an EPSRC Experienced Researcher Fellowship (2015-2020) I spent the early part of my career in Japan (1996-2012). I was a Toshiba Fellow, a postdoc at Tokyo University with Junichi Tsujii and Associate Professor at the (then) newly formed National Institute of Informatics where I led the NLP lab for 12 years before returning to the UK on a Marie Curie Research Fellowship. As an undergraduate I studied for a BSc. in Computer Science at the University of Leeds (1992). I received an MSc in Machine Translation (1994) and a PhD in Computational Linguistics (1996) from the University of Manchester (UMIST) for my research on English-Japanese Lexical Transfer using a Hopfield Neural Network. My current roles are Professor of NLP, co-leading the Language Technology Lab at Cambridge, Professorial Fellow at Murray Edwards College, and also Chief Scientist at Trismik, a spinout I co-founded and which launched in May 2025.

A man giving a presentation in front of a projection screen, holding a notebook, with speakers and a laptop on the table in front of him.

Nigel H. Collier

Reflections

Spotlight

UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation

SIMBENCH: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Conformity in Large Language Models

500xCompressor: Generalized Prompt Compression for Large Language Models

Generative Language Models Exhibit Social Identity Biases

Trident: Benchmarking LLM Safety in Finance, Medicine, and Law

Research

Yinhong Liu

Meiru Zhang

Tiancheng Hu

Chang Shu

Caiqi Zhang

Yinjiang River Dong

Zongqian Li

Sanhanat Sivapiromrat

Auss Abbood

Paul Martin

Zack Hui

Ehsan Shareghi

Zaiqiao Meng

Zihao Fu

Bio

Nigel H. Collier