Hey!

I’m a machine learning researcher focused on making artificial general intelligence (AGI) safe, aligned and socially beneficial.

I’m currently a research scholar at MATS in London and previously Berkeley, advised by Scott Emmons, David Lindner and Roland Zimmermann (Google DeepMind, AGI Safety and Alignment / Frontier Loss of Control Team). We’re training reasoning models that can resist reinforcement learning by intentionally under-exploring. Our broad goal is understanding how models can influence their training outcomes. This is involving running lots of multi-turn RL on LLMs with unusual reward structures. I’m currently generally interested in RL methods in AI safety, and more widely in monitoring/control, evals and interpretability techniques.

Previously:

I’m working on AI safety now because I’ve been obsessed with AGI since I was 15, but things have moved a lot faster than even I expected, and I think that helping to make this crucial moment in human history go well is the highest-expected-impact thing anyone can be doing.

In my moments of downtime, I have two side quests:

Please feel free to reach out to me at damon.falck@gmail.com!

Recent updates

Oct 2025Exploration hacking paper accepted with oral to NeurIPS 2025 BioSafe GenAI workshop
Oct 2025I’m mentoring a project for the Fall 2025 Algoverse AI Research cohort
Sep 2025I’ll be on the teaching staff for BlueDot Impact’s new AGI Strategy Course
Sep 2025Reviewer for NeurIPS mech interp workshop
Aug 2025Received 6 months funding for my exploration hacking research
with the DeepMind safety team
Aug 2025One of 10 highest-rated research plans invited to give a Spotlight Talk at the MATS Symposium
Jul 2025Posted an early-stage update on our research plan on the Alignment Forum
May 2025Accepted into MATS 8.0 (3 offers!)

Some things of mine

Publications

Trading Off Resource Budgets for Improved Regret Bounds
Damon Falck, Thomas Orton
NeurIPS 2022

This paper explores bandit algorithms for an interesting reinforcement learning setting where an agent can take multiple actions at each round and reap the reward from the best of them. It’s natural when there’s a finite parallel compute budget available. I co-published it with my supervisor based on part of my master’s thesis.

Multitasking Bandits: Learning Fail-Safe Action Combinations
Damon Falck
MMathCompSci Thesis at University of Oxford | 2022 Best Thesis Prize

This is my master’s thesis, which won the best project prize for my cohort. It’s an extended, more conversationally-presented version of the paper above.

Resisting RL Elicitation of Biosecurity Capabilities: Reasoning Models Exploration Hacking on WMDP
Damon Falck, Joschka Braun, Eyon Jang, Roland S. Zimmermann, David Lindner, Scott Emmons
NeurIPS 2025 BioSafe GenAI Workshop

A presentation of some preliminary results from our exploration hacking work in a biosecurity elicitation setting.

More coming soon!


Projects

Exploration Hacking: Can Reasoning Models Subvert RL?
Damon Falck, Joschka Braun, Eyon Jang
Jul 2025
Early call-for-feedback blog post on our ongoing project.

Proposal: Model Organisms of Exploration Hacking
Damon Falck, Joschka Braun, Eyon Jang
Jul 2025
Funding proposal for the MATS extension program.

Exploring Graph Attention Networks
Damon Falck, Kaloyan Aleksiev, Filip Mihov, Samuel Barrett
Apr 2022
Group project during my Oxford master’s. We looked at several variants of the attention mechanism used in Transformers and GATs, defining new types of static and dynamic attention, proving some formal properties and comparing their performance empirically.

Are Adversarially Robust Deep Nets Always Better Transfer Learners?
Damon Falck
Dec 2021
I wanted to explore the relationship between adversarial robustness and representation transferability. This project ended up being mainly empirical.

Neuroevolution of Plastic Spiking Networks: Replaying Nature’s Creative Process
Damon Falck
Apr 2022
This is a bit of a literature review on topics approaching the intersection of neuroplasticity and neuroevolution. I was hoping to do some original research off the back of this, but never found time.

Lecture Notes on Foundations of Statistical Inference
Damon Falck, Julien Berestycki
May 2021
I wrote up these typeset notes for a lecture course on statistical inference for fun, and they were subsequently used as the official notes for the course.


Open-source code

The vast majority of my engineering output has been on proprietary codebases.

Exploration Hacking
2022
Work-in-progress research codebase for segmented multi-turn agentic LoRA RL experiments on reasoning models. Based on a custom fork of Verifiers.

BlackBoxBandits
2022
A library for comparing black-box optimization algorithms and bandit combinations of them on online streams of ML hyperparameter-selection tasks.

Graph Attention Networks PyTorch library
2022
A set of utilities for training and comparing Graph Attention Network variants, written using PyTorch Lightning.

MicroAuth
2021
An elegant, minimal macOS menu-bar application to provide 2FA codes on demand (written in Swift).

More coming soon!


Miscellaneous technical writing

From a past era!

Modelling Tidal Dissipation in Io
Damon Falck, Gianmarco Luppi, Leon Galli, Thalia Seale, Alexa Chambers, Jake Saville
Nov 2017
Part of my team’s submission for the Princeton University Physics Competition. We ended up coming first worldwide, and had some great fun along the way.

An Introduction to the Calculus of Variations
Damon Falck

Oct 2017
I got interested in the brachistochrone problem, which led me to the calculus of variations. This intrigued me and I wrote a short intro to it to help myself learn.

An Investigation into Electric Fields around Charged Spheres
Damon Falck
Sep 2017
A number of interesting problems were mentioned to me by one of my teachers and I decided to play around with them a little.

Simulating the Evolution of the Velocity Distribution in an Ideal Gas
Damon Falck
Jan 2017
We’d just studied ideal gases in physics and I wanted to see if I could simulate one from first principles and recover the Maxwell-Boltzmann distribution empirically.

Some Mathematics from my Time at Highgate School
Damon Falck
Jul 2018
There’s plenty of fun stuff I wrote during these years (I just selected a few highlights below), and I compiled much of it here!

Worcester College JCR Constitution
Damon Falck
Jun 2019
This isn’t exactly mathematics, but it took some vaguely-mathematical thinking to write! (The existing constitution was in great disrepair and I thought that had to change.)


Music

Notes

Mahler 5 notes
Jul 2022
A friend was going to see this symphony so I wrote her some notes on it. I would like to do this with more works! I think that some elements of a basic structural analysis can be pretty helpful to the modal listener when taken out of a full-form essay context.

Is analysing music a worthwhile task?
2017

Una voce poco fa analysis
2017


My (abridged) résumé

Please contact me for a full PDF résumé.

Education

University of Oxford | MMathCompSci
2021–2022

  • Distinction; Best Project Prize (86).
  • First-author published in NeurIPS.
  • Supervised by Varun Kanade (Alan Turing Institute), Thomas Orton (Future of Humanity Institute).
  • Coursework/projects included: Transformers and Graph Attention Networks, theoretical deep learning (transfer learning and adversarial robustness), computational neuroscience and biological applications of deep learning, computational learning theory, and database systems implementation.
  • Other research projects: “Exploring Graph Attention Networks”, “Are adversarially robust deep nets always better transfer learners?”, “Neuroevolution of plastic spiking networks: replaying nature’s creative process”.
  • Treasurer of Oxford University Filmmaking Foundation; member of several orchestras.

University of Oxford | BA1 Mathematics & Computer Science
2018–2021

  • Double First-Class (77/83); 9th in cohort in Part A.
  • Specialised in probability theory, stochastic calculus, machine learning (after a brief foray into physics — fluid dynamics, relativity, quantum theory).
  • Academic Scholarship. Vice-President and Treasurer of Worcester College JCR; Publicity Officer of Worcester College Music Society; member of too many orchestras to list.

Highgate School, London
2011-2018

  • Winner, Princeton University Physics Competition; Murray-Pound Mathematics Prize; Chalmers Physics Prize; Arkwright Engineering Scholarship; Rank Foundation Leadership Award; Head’s Prize; Instrumental Scholarship.
  • Founded a fortnightly series of back-to-back short-form lunchtime mathematics talks for students and teachers; started a hugely successful annual full-day version of this for outreach partner schools.
  • A-level: A* Mathematics, Further Mathematics, Physics, plus AS Additional Further Mathematics (A), and half a Music Pre-U! GCSE: 11 A*s.

Experience

MATS | Research Scholar
2025–

  • Technical AI safety research.
  • Direct advisors: Scott Emmons, Erik Jenner, David Lindner, Roland Zimmermann (Google DeepMind AGI Safety Team)
  • We’re building model organisms of exploration hacking.

DRW | Quantitative Researcher
2022–2025

  • Research engineering role, with a specialty in designing and building forecasting models and infrastructure. Close work with internal AI lab. Written 100,000s of lines of production Python code and authored several internal research papers.
  • Interviewed 70+ candidates, mentored interns/new graduates; project manager for graduate traders on my team.
  • Promoted >1 year before peers in recognition of technical impact.

DRW | Quantitative Trading Intern
Jun-Aug 2021

  • Data science and engineering projects including trade optimizers, visualization tools, data servers, predictive volatility impact models, and various statistical investigations. Multiple tools productionised and since relied upon.
  1. The BA isn’t awarded technically, it’s subsumed by the MMathCompSci which includes classifications for all the years. ↩︎