Ole Jorgensen

Overview of me:

Hello! My name is Ole, and I am a research engineer currently working at the AI Security Institute with the Chem-Bio team. I've previously led research on language model evaluations and interpretability. Before joining AISI I completed an MSc in Artificial Intelligence from Imperial College London, and have received an MMath in Mathematics from the University of St Andrews.

I am always keen to talk to folks about AI safety, evals, and interpretability. I'm thinking about mentoring junior students on evaluations projects. If you might be interested, please reach out!

Recent Work:

Early Insights from Developing Question-Answer Evaluations for Frontier AI I wrote this blog post alongside Friederike Grosse-Holz! We share lots of in the weeds details we have learned from developing and conducting QA evaluations.

"Improving Activation Steering in Language Models with Mean-Centring", winner of the best paper award at Human-Centric Representation Learning at AAAI 2024. Ole Jorgensen, Dylan Cope, Nandi Schoots, Murray Shanahan. A follow up to work from my Dissertation, we develop a new method of activation steering, called mean-centring, which is more general than previous methods. We evaluate it in a variety of settings, including applying it to recent work on function vectors by Eric Todd et. al.