Overview of me:

OleFace
This is me!

Hello! My name is Ole, and I am a research engineer currently working at the AI Safety Institute. I've previously led research on language model evaluations and interpretability. I recently completed an MSc in Artificial Intelligence from Imperial College London, and have received an MMath in Mathematics from the University of St Andrews.

I'm currently unsure what I'll be up to from mid 2024 onwards. I am currently applying for PhDs for 2024 start, and am interested in engineering and research roles working on AI safety. I am also open to collaborating on research projects about language model evaluations and interpretability - reach out if you might be interested!

Recent Work:

"Improving Activation Steering in Language Models with Mean-Centring", winner of the best paper award at Human-Centric Representation Learning at AAAI 2024. Ole Jorgensen, Dylan Cope, Nandi Schoots, Murray Shanahan. A follow up to work from my Dissertation, we develop a new method of activation steering, called mean-centring, which is more general than previous methods. We evaluate it in a variety of settings, including applying it to recent work on function vectors by Eric Todd et. al.

"Self-Consistency of Large Language Models under Ambiguity", BlackboxNLP Workshop at EMNLP 2023. Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau. Completed as part of AI Safety Camp. We investigated the self-consistency of various OpenAI models under ambiguity, using the novel setting of integer sequences.

I completed my Dissertation on investigating latent spaces of transformer models. My supervisors were Murray Shanahan, Dylan Cope, and Nandi Schoots, who I have all enjoyed working with immensely. Specifically we investigated when transformer models represent features geometrically, and how this can be used to better control models (such as via activation additions and feature detection). You can see my Dissertation here.