Papers & Research

Work that leans more into research, evaluation, and writing than product UI, but still grounded in concrete systems and data.

Benchmarking Commonsense Visual Reasoning for Vision-Language Models

Co-author · Research with Prof. Ernest Davis (NYU) · 2025 · Manuscript in preparation

Diagnostic study of visual commonsense reasoning with a focus on visibility, occlusion, and viewpoint shifts in vision-language models.

My role

Built a diagnostic benchmark (100 base + 100 counterfactual images; 100 questions + 100 flips) and automatic graders
Evaluated six VLMs (ChatGPT, Claude, LLaVA, etc.) and analysed hallucination/abstention behavior

Co-author · NYU Natural Language Processing · Course project · 2024

Fine-tuned BERT-family models on legal corpora; compared against classical baselines with reproducible training and reporting.

My role