- First Law of Mentat, Dune
I am a PhD candidate in the Interpretable Neural Networks lab at Northeastern University. I am fortunate to be advised by Prof. David Bau.
How do language models (or AI systems in general) perform complex reasoning tasks? Do they break down a complex problem into smaller steps, like a programmer implementing an algorithm? If so: how modular and composable are these small reasoning steps? How are they implemented in the model's internals? — these are some of key questions I am exploring in my research.
Broadly, I apply tools from mechanistic interpretability to form faithful abstractions of the inner workings of LLMs. I am interested in how such insights can help us address specific failure modes on AI systems by designing edits/interventions that target the right abstractions.
Before starting my PhD, I was a Software Engineer at Samsung Research and used to teach at SUST, from where I have completed my B.Sc. in Computer Science and Engineering.
Feel free to reach out to me if you would like to chat about research, collaboration, or if you have any questions about my work.
News
| 10/2025 | New paper! LLMs Process Lists With General Filter Heads. How do LMs filter from a list of items? We find that LLMs use surprisingly elegant mechanisms, similar to the classic filter function in programming languages. They can implement the same filter operation using 2 mechanisms based on what information is available: (1) Lazy Evaluation:: when the question comes after the list, the LM uses special attention heads. (2) Eager Evaluation: when the question is known before the list, the LM can eagerly evaluate each item if they satisfy the criteria and store the intermediate result as a flag in the item representations. Checkout the project page for more details! And, here is the Twitter thread for a quick summary. |
| 10/2025 | Serving as a reviewer for ICLR 2026. |
| 08/2025 | Attended The 2nd New England Mechanistic Interpretability (NEMI) workshop in Boston. |
| 07/2025 | Serving as a reviewer for NeurIPS 2026. |
| 06/2025 | |
| 05/2025 | Serving as a reviewer for COLM 2025. |
| 04/2025 | Attended NENLP 2025 at Yale. |
| 03/2025 | Serving as a reviewer for ICML 2025. |
| 01/2025 | The NNsight and NDIF paper is out! Super excited about NDIFs mission to enable interpretability research on very large neural networks. |
| 09/2024 | Serving as a reviewer for ICLR 2025. |
| 08/2024 | Interpretability researchers are still trying to understand what is the right level of abstraction for conceptualizing neural network computations. Our new survay paper proposes a perspective this grounded on causal mediation analysis. |
| 07/2024 | Serving as a reviewer for NeurIPS 2024. Excited to see many interesting works on interpretability, some of them directly building upon works from our lab! |
| 06/2024 | Serving as a reviewer for COLM 2024. |
| 04/2024 | New paper, Locating and Editing Factual Associations in Mamba. Mamba is a new generation of sequence modeling architecture that achives per-parameter performance with Transformers in multiple modalities, including language modeling. With the development of such novel architectures, we interpretability researchers must ask - To what extent our insights on certain mechanism (at a high-level) generalize across different architectures? This paper is a case study where we apply the tools developed for understanding and editing factual associations in Transformers to Mamba and check if the insights generalize. Fine more at project page, [code]. (Update: Accepted at COLM 2024!) |
| 10/2023 | Another paper! Function Vectors in Large Language Models. In this cool paper we show that LLMs encode functions (input-output mappings under a relation, or for performing a certain task, like translation) as a vector in their representation. Checkout this Twitter thread for more information. (Update: Accepted at ICLR 2024!) |
| 08/2023 | New paper! Linearity of Relation Decoding in Transformer LMs. In this paper we show that for a subset of relations LLMs (highly non-linear) relation decoding procedure can be well-approximated by a single linear transformation (LRE) on the subject representation after some intermediate layer. And this LRE can be achived by constructing a first-order approximation to the LLM computation from a single input. Fine more at project page, [code] (Update: Accepted at ICLR 2024!) |
| 01/2023 | Our paper Mass-Editing Memory in a Transformer has been accepted at ICLR 2023 (top 25%)! |
| 10/2022 | New paper! Mass-Editing Memory in a Transformer. Here we scale up ROME to edit upto 10K memories in a LLM. Find more at project page. |
| 09/2022 | Starting my PhD at Northeastern University, Boston. I will be working with Prof. David Bau on interpretability of LLMs. |
