I am a computer science PhD student in the Interpretable Neural Networks lab at Northeastern University. I am fortunate to be advised by Prof. David Bau.
I apply tools from mechanistic interpretability to form faithful abstractions of the inner workings of LLMs. I focus on studying LLMs' knowledge — beliefs about the real world they have learned during training — how this knowledge is stored in the model parameters, how it is encoded in the model representations during retrieval, and if LLMs can reason with this knowledge in scenarios where it needs to link together multiple pieces of information. Broadly, I am interested in how understanding the inner mechanisms of LLMs can help us detect bugs (bias, false/outdated associations) in LLMs and develop tools to steer their behavior with the goal of making them more reliable.
Before starting my PhD, I was a Software Engineer at Samsung Research and used to teach at Shahjalal University of Science and Technology, from where I have completed my B.Sc. in Computer Science and Engineering.
Feel free to reach out to me if you would like to chat about research, collaboration, or if you have any questions about my work.
News
06/2025 | |
05/2025 | Serving as a reviewer for COLM 2025. |
04/2025 | Attended NENLP 2025 at Yale. |
03/2025 | Serving as a reviewer for ICML 2025. |
01/2025 | The NNsight and NDIF paper is out! Super excited about NDIFs mission to enable interpretability research on very large neural networks. |
09/2024 | Serving as a reviewer for ICLR 2025. |
08/2024 | Interpretability researchers are still trying to understand what is the right level of abstraction for conceptualizing neural network computations. Our new survay paper proposes a perspective this grounded on causal mediation analysis. |
07/2024 | Serving as a reviewer for NeurIPS 2024. Excited to see many interesting works on interpretability, some of them directly building upon works from our lab! |
06/2024 | Serving as a reviewer for COLM 2024. |
04/2024 | New paper, Locating and Editing Factual Associations in Mamba. Mamba is a new generation of sequence modeling architecture that achives per-parameter performance with Transformers in multiple modalities, including language modeling. With the development of such novel architectures, we interpretability researchers must ask - To what extent our insights on certain mechanism (at a high-level) generalize across different architectures? This paper is a case study where we apply the tools developed for understanding and editing factual associations in Transformers to Mamba and check if the insights generalize. Fine more at project page, [code]. (Update: Accepted at COLM 2024!) |
10/2023 | Another paper! Function Vectors in Large Language Models. In this cool paper we show that LLMs encode functions (input-output mappings under a relation, or for performing a certain task, like translation) as a vector in their representation. Checkout this Twitter thread for more information. (Update: Accepted at ICLR 2024!) |
08/2023 | New paper! Linearity of Relation Decoding in Transformer LMs. In this paper we show that for a subset of relations LLMs (highly non-linear) relation decoding procedure can be well-approximated by a single linear transformation (LRE) on the subject representation after some intermediate layer. And this LRE can be achived by constructing a first-order approximation to the LLM computation from a single input. Fine more at project page, [code] (Update: Accepted at ICLR 2024!) |
01/2023 | Our paper Mass-Editing Memory in a Transformer has been accepted at ICLR 2023 (top 25%)! |
10/2022 | New paper! Mass-Editing Memory in a Transformer. Here we scale up ROME to edit upto 10K memories in a LLM. Find more at project page. |
09/2022 | Starting my PhD at Northeastern University, Boston. I will be working with Prof. David Bau on interpretability of LLMs. |