Michael Hanna

Logo

Home

CV/Resume

Publications

Hosted on GitHub Pages — Theme by orderedlist

I’m Michael Hanna, a PhD student at the ILLC at the University of Amsterdam, supervised by Sandro Pezzelle and Yonatan Belinkov, through ELLIS. I’m interested in interpreting and evaluating NLP models by combining techniques from diverse fields such as cognitive science and mechanistic interpretability.

I’m looking for internships during Summer 2025—send me a message if you have any opportunities to share!

News

Selected Publications

For a full list, see Publications.

Curt Tigges, Michael Hanna, Qinan Yu, Stella Biderman. 2024. LLM Circuit Analyses Are Consistent Across Training and Scale. To appear in the Thirty-eight Conference on Neural Information Processing Systems. (NeurIPS 2024)

Michael Hanna, Sandro Pezzelle, and Yonatan Belinkov. 2024. Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms. To appear at the First Conference on Language Modeling (COLM). (COLM 2024)

Michael Hanna, Yonatan Belinkov, and Sandro Pezzelle. 2023. When Language Models Fall in Love: Animacy Processing in Transformer Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). (EMNLP 2023)

Michael Hanna, Ollie Liu, and Alexandre Variengien. 2023. How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. In the Thirty-seventh Conference on Neural Information Processing Systems. (NeurIPS 2023)

Michael Hanna, Roberto Zamparelli, and David Mareček. 2023. The Functional Relevance of Probed Information: A Case Study. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 835–848, Dubrovnik, Croatia. Association for Computational Linguistics. (EACL 2023)

Michael Hanna and David Marecek. 2021. Investigating BERT’s Knowledge of Hypernymy through Prompting. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Punta Cana, Dominican Republic. Association for Computational Linguistics. (BlackBoxNLP 2021)

Academic Interests

My main research interest, interpretability in the context of modern language models, is twofold. First, I’m interested in asking “What are the abilities of these models?” from a perspective informed by linguistics and cognitive science. In this so-called behavioral paradigm, I study these models on a pure input-output level, leveraging the wide body of psycholinguistic research conducted on humans. Second, I’m interested in answering “How do these models achieve such impressive performance on linguistic tasks?”, using techniques from (mechanistic) interpretability. I use causal interventions to perform low-level studies of language models, uncovering the mechanisms that drive their behavior.

Personal Interests