LLM Explainability & Interpretability

RevLLM is a Python library and a Streamlit webapp for exploring internal workings of Large Language Models (LLMs).

RevLLM builds on top the nanoGPT implementation of the GPT-2 model family developed by Andrej Karpathy and adheres to its spirit of simplicity and transparency. We restrict the dependencies to a bare minimum and strive for clean and simple code that can be easily understood and reused.

The RevLLM library implements various methods for analyzing the internal data-flow of transformer decoder-type models. In particular, RevLLM,

shows how transformer models map a prompt sentence to a sequence of tokens (integers).
explores the token embedding space with statistical methods.
implements various input prompt importance methods to identify prompt tokens that the model pays most attention to.
implements logit lens interpretability methods to shed light on the flow of information through the transformer block sequence.

To facilitate the ease of use and provide a hassle-free experimentation experience, we accompany the library with an interactive Streamlit app and provide a web interface to access the library functionality. The app automatically downloads and instantiates a chosen model from the GPT-2 family using the Huggingface model repository, and exposes the RevLLM library methods through a convenient interface.

https://revllm.dev/

Project sponsored by the German Federal Ministry of Education and Research

(Förderkennzeichen: 01IS23S42)