Papers & Research

Work that leans more into research, evaluation, and writing than product UI, but still grounded in concrete systems and data.

Benchmarking Commonsense Visual Reasoning for Vision-Language Models

Co-author · Research with Prof. Ernest Davis (NYU) · 2025 · Manuscript in preparation

Diagnostic study of visual commonsense reasoning with a focus on visibility, occlusion, and viewpoint shifts in vision-language models.

My role

  • Built a diagnostic benchmark (100 base + 100 counterfactual images; 100 questions + 100 flips) and automatic graders
  • Evaluated six VLMs (ChatGPT, Claude, LLaVA, etc.) and analysed hallucination/abstention behavior

Legal Text Classification with BERT and LegalBERT

Co-author · NYU Natural Language Processing · Course project · 2024

Fine-tuned BERT-family models on legal corpora; compared against classical baselines with reproducible training and reporting.

My role

  • Compared BERT-Double, Legal-BERT, and Custom-Legal BERT models
  • Implemented end-to-end NLP pipeline over legal case corpora
  • Proposed and implemented a composite evaluation metric for legal QA