User profile picture

Selected Work

View all
Rally landing page showing the Live2D character product surface

Rally

AI-native interactive character experience with Live2D, browser-local chat posture, memory, progression, dates, mana, and paid-alpha rails.

Research

BS-Bench

A 600-game benchmark for measuring LLM deception, lie detection, and instruction compliance through the bluffing card game Bullshit.

Games
600
Matchups
15
Prompt conditions
4
Models
6 total (4 per round)

“Honesty prompts reduced lying, but also reduced challenges and made remaining lies more successful.”

Writing

View all