User profile picture

Selected Work

View all

Research

BS-Bench

A 600-game benchmark for measuring LLM deception, lie detection, and instruction compliance through the bluffing card game Bullshit.

Games
600
Matchups
15
Prompt conditions
4
Models
6 total (4 per round)

“Honesty prompts reduced lying, but also reduced challenges and made remaining lies more successful.”

Writing

View all