I am a PhD student in the Interpretable Neural Networks lab at Northeastern University, Boston. I am fortunate to be advised by Prof. David Bau.
My research focuses on understanding what factual informations (beliefs about the real world) a language model has learned, how the LM stores this information in its parameters, and how this information is retrieved. Below are some of the key questions I am exploring:
How to edit a factual association with appropriate entailments?
How can we enable the LMs to do System-2 reasoning? Is CoT enough or the only way?
Broadly, I am interested in understanding the inner workings of large language models. And how such findings can help us detect bugs (bias, false/outdated associations) in LMs and develop tools to steer their behavior with the goal of making them more reliable.
Before starting my PhD, I was a Software Engineer at Samsung Research and used to teach at Shahjalal University of Science and Technology, from where I have completed my B.Sc. in Computer Science and Engineering.
Feel free to reach out to me if you would like to chat about research, collaboration, or if you have any questions about my work.
News
[July-1-2024] Serving as a reviewer for NeurIPS 2024. Excited to see many interesting works on interpretability, some of them directly building upon works from our lab!
[June-1-2024] Serving as a reviewer for COLM 2024.
[April-4-2014] New paper, Locating and Editing Factual Associations in Mamba. Mamba is a new generation of sequence modeling architecture that achives per-parameter performance with Transformers in multiple modalities, including language modeling. With the development of such novel architectures, we interpretability researchers must ask - To what extent our insights on certain mechanism (at a high-level) generalize across different architectures? This paper is a case study where we apply the tools developed for understanding and editing factual associations in Transformers to Mamba and check if the insights generalize. Fine more at project page, [code]. (Update: Accepted at COLM 2024!)
[October-23-2023] Another paper! Function Vectors in Large Language Models. In this cool paper we show that LLMs encode functions (input-output mappings under a relation, or for performing a certain task, like translation) as a vector in their representation. Checkout this Twitter thread for more information. (Update: Accepted at ICLR 2024!)
[August-17-2023] New paper! Linearity of Relation Decoding in Transformer LMs. In this paper we show that for a subset of relations LLMs (highly non-linear) relation decoding procedure can be well-approximated by a single linear transformation (LRE) on the subject representation after some intermediate layer. And this LRE can be achived by constructing a first-order approximation to the LLM computation from a single input. Fine more at project page, [code] (Update: Accepted at ICLR 2024!)
[January-20-2023] Our paper Mass-Editing Memory in a Transformer has been accepted at ICLR 2023 (top 25%)!
[October-13-2022] New paper! Mass-Editing Memory in a Transformer. Here we scale up ROME to edit upto 10K memories in a LLM. Find more at project page.
[September-1-2022] Starting my PhD at Northeastern University, Boston. I will be working with Prof. David Bau on interpretability of LLMs.