
(Click to skip) →
About
I'm a software engineer who likes making AI feel concrete: characters with memory, agents that play games, browser apps that run locally, and benchmarks that turn strange model behavior into something measurable.
A lot of my work starts from games and character interfaces because they make systems easier to judge. If the agent is confused, the UI shows it. If the model is useful, you can see it making decisions instead of reading a vague claim about intelligence.
Recently I've been focused on Phantasy, Rally, LLM Plays Fire Emblem, and BS-Bench. They are different versions of the same interest: AI systems with behavior you can inspect, play with, and measure.