I am a PhD student in the Interpretable Neural Networks lab at Northeastern University, Boston. I am extremely fortunate to be advised by Prof. David Bau.

I am interested in understanding the inner workings of large language models; such as, how they encode factual and functional (task-specific) information in their representations, what computation is performed at different states. And, how such understandings will help us develop tools to fix bugs (bias, false/outdated factual associations) in LLMs and control their behavior with the goal of making them safer and more reliable.

News

  • [October-23-2023] Another paper! Function Vectors in Large Language Models. In this cool paper we show that LLMs encode functions (input-output mappings under a relation, or for performing a certain task, like translation) as a vector in their representation. Checkout this Twitter thread for more information. (Update: Accepted at ICLR 2024!)

  • [August-17-2023] New paper! Linearity of Relation Decoding in Transformer LMs. In this paper we show that for a subset of relations LLMs (highly non-linear) relation decoding procedure can be well-approximated by a single linear transformation (LRE) on the subject representation after some intermediate layer. And this LRE can be achived by constructing a first-order approximation to the LLM computation from a single input. Fine more at project page, [code] (Update: Accepted at ICLR 2024! Spotlight!)

  • [January-20-2023] Our paper Mass-Editing Memory in a Transformer has been accepted at ICLR 2023 (top 25%)!

  • [October-13-2022] New paper! Mass-Editing Memory in a Transformer. Here we scale up ROME to edit upto 10K memories in a LLM. Find more at project page.

  • [September-1-2022] Starting my PhD at Northeastern University, Boston. I will be working with Prof. David Bau on interpretability of LLMs.