Michael Hanna





Hosted on GitHub Pages — Theme by orderedlist

I’m Michael Hanna, a PhD student at the ILLC at the University of Amsterdam, supervised by Sandro Pezzelle and Yonatan Belinkov, through ELLIS. I’m interested in interpreting and evaluating NLP models by combining techniques from diverse fields such as cognitive science and mechanistic interpretability.


Selected Publications

For a full list, see Publications.

Michael Hanna, Yonatan Belinkov, and Sandro Pezzelle. 2023. When Language Models Fall in Love: Animacy Processing in Transformer Language Models. To appear in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). (EMNLP 2023)

Michael Hanna, Ollie Liu, and Alexandre Variengien. 2023. How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. To appear at the Thirty-seventh Conference on Neural Information Processing Systems. (NeurIPS 2023)

Michael Hanna, Roberto Zamparelli, and David Mareček. 2023. The Functional Relevance of Probed Information: A Case Study. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 835–848, Dubrovnik, Croatia. Association for Computational Linguistics. (EACL 2023)

Michael Hanna*, Federico Pedeni*, Alessandro Suglia, Alberto Testoni, and Raffaella Bernardi. 2022. ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea. International Committee on Computational Linguistics. (COLING 2022)

Michael Hanna and David Marecek. 2021. Investigating BERT’s Knowledge of Hypernymy through Prompting. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Punta Cana, Dominican Republic. Association for Computational Linguistics. (BlackBoxNLP 2021)

Academic Interests

My main research interest, interpretability in the context of modern language models, is twofold. First, I’m interested in asking “What are the abilities of these models?” from a perspective informed by linguistics and cognitive science. In this so-called behavioral paradigm, I study these models on a pure input-output level, leveraging the wide body of psycholinguistic research conducted on humans. Second, I’m interested in answering “How do these models achieve such impressive performance on linguistic tasks?”, using techniques from (mechanistic) interpretability. I use causal interventions to perform low-level studies of language models, uncovering the mechanisms that drive their behavior.

Personal Interests