rusheb

Follow

Rusheb Shah rusheb

Follow

24 followers · 31 following

Achievements

BetaSend feedback

Achievements

BetaSend feedback

Block or Report

Block or report rusheb

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

rusheb/README.md

Hi there, I'm Rusheb!

I am currently working on LLM Evaluations at Apollo Research.

Past OSS contributions:

I contributed to the mechanistic interpretability library TransformerLens. Most notably, I added support for BERT to the library.
I worked on MazeDataset, a library for generation, filtering, solving, visualizing, and processing of mazes for training ML systems.

Research:

I co-authored a neurips workshop, paper Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation, where we used language models to automate generation of narrative-based jailbreaks on GPT-4 and other SOTA models.

Pinned

neelnanda-io/TransformerLens neelnanda-io/TransformerLens Public

A library for mechanistic interpretability of GPT-style language models

Python 902 197
arena-hackathon-attribution-patching arena-hackathon-attribution-patching Public

A novel automated circuit discovery algorithm based on attribution patching. First-prize winner of ARENA Interpretability Hackathon.

Python 3 1
understanding-search/maze-transformer understanding-search/maze-transformer Public

This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.

Jupyter Notebook 24 6
chat chat Public

A basic async terminal chatroom app that I built to help me learn asynchronous programming with asyncio.

Python
coursera-machine-learning coursera-machine-learning Public

My solutions to the exercises from Andrew Ng's Machine Learning Course (Coursera).

MATLAB
cs50 cs50 Public

My problem set solutions for CS50 2018.

C