Sachit Menon

I am a PhD student in Computer Science at Columbia University working on problems in machine learning with Carl Vondrick. My doctoral work is supported by a Columbia Presidential Fellowship and an NSF Graduate Research Fellowship.

Extremely important: I am also an expert on large dogs, despite a lack of recognition on the topic.

Currently, I am also completing an internship at Meta AI (GenAI) exploring diffusion models and LLMs supervised by Rohit Girdhar and Ishan Misra.

Previously, I completed by B.S. in Mathematics and Computer Science at Duke University, where I was fortunate to work with Dr. Cynthia Rudin.

Email  /  CV  /  Google Scholar

profile photo
Research

Through my research, I hope to develop new ways to learn, utilize, or understand models at scale. This makes me particularly interested in representation learning, generative modeling, and self-supervised methods, as well as their intersection. Recently, I am particularly interested in the potential for language to aid vision tasks.

Selected papers:
ViperGPT: Visual Inference via Python Execution for Reasoning
Sachit Menon*, Dídac Surís*, Carl Vondrick.
ICCV 2023, Oral.
arXiv, Code

We introduce ViperGPT, a framework that leverages code-generation models to compose vision-and-language models into subroutines to produce a result for any query. ViperGPT utilizes a provided API to access the available modules, and composes them by generating Python code that is later executed.

Visual Classification via Description from Large Language Models
Sachit Menon, Carl Vondrick.
ICLR 2023, Notable - Top 5% (Oral).
arXiv, Code

We enhance zero-shot recognition with vision-language models by comparing to category descriptors from GPT-3, enabling better performance in an interpretable setting that also allows for incorporation of new concepts and bias mitigation.

Task Bias in Vision-Language Models
Sachit Menon*, Ishan Chandratreya*, Carl Vondrick.
IJCV 2023.
arXiv

We conduct an in-depth exploration of the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others and propose a basic method to overcome this bias.

Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse
Sachit Menon, David Blei, Carl Vondrick.
UAI 2022.
arXiv

We incorporate a `critic' into the standard VAE framework that aims to pair up corresponding samples from the observed and latent distributions, mitigating posterior collapse.

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
Sachit Menon*, Alex Damian*, Shijia Hu, Nikhil Ravi, and Cynthia Rudin.
CVPR, 2020
arXiv

Self-supervised search of the outputs of a generative model, leveraging some properties of high-dimensional Gaussians, enables super-resolution with higher perceptual quality than previous methods.

Teaching

TA, Neural Networks and Deep Learning with Prof. Rich Zemel, Columbia University
TA, Machine Learning (Graduate) with Prof. Cynthia Rudin , Duke University

Service

Organizer, Learning from Unlabeled Video Workshop (LUV 2021), CVPR 2021
Reviewer, CVPR / ICCV / ECCV / NeurIPS / ICML / AISTATS


Website template credits to Jon Barron.