Tianyi Gao

Hi! I am a first-year CS PhD student in the Multimodal Vision Research Laboratory at Washington University (WashU), advised by Dr. Nathan Jacobs. I work on computer vision and multimodal learning.

Before joining WashU, I obtained my Bachelor's and Master's degree from Wuhan University (WHU). I also spent a wonderful time at Microsoft Research Asia in 2024, working on MLLMs for scientific diagrams. Happy for research discussions and collaborations!

News

Research

My past research focused on learning representations for few-shot scenarios and building MLLMs for geospatial tasks. Currently, I am working on multimodal learning and world model-related problems. If you share similar interests, feel free to reach out for collaboration!

PEACE: geologic map understanding with MLLMs
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang*, Tianyi Gao*, Haoran Xu, Qihao Zhao, Yang Song, Zhipeng Gui, Tengchao Lv, Lei Cui, Scarlett Li, Furu Wei
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Microsoft Foundry Labs Project

We introduce GeoMap-Bench, a vision-language benchmark for geologic map understanding. It consists of 25 task types, which measure abilities across 5 aspects. Our benchmark reveals a significant performance gap between state-of-the-art MLLMs and human experts, we further explore agentic baselines to improve the performance.

PRUE: field boundary segmentation at scale
PRUE: A Practical Recipe for Field Boundary Segmentation at Scale
Gedeon Muhawenayo, Caleb Robinson, Subash Khanal, Zhanpei Fang, Isaac Corley, Alexander Wollam, Tianyi Gao, Leonard Strnad, Ryan Avery, Lyndon Estes, Ana M. TΓ‘rano, Nathan Jacobs, Hannah Kerner
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Project

PRUE delivers a deployment-oriented framework for large-scale field boundary segmentation, developed with industry (Microsoft, Wherobots) and academic (ASU, WashU, OSU, Clark) collaborators, and demonstrates that a strong, well-engineered recipe can outperform a wide range of geospatial foundation models under real-world conditions.

Few-shot semantic segmentation in remote sensing with foundation models
Enrich Distill and Fuse: Generalized Few-Shot Semantic Segmentation in Remote Sensing Leveraging Foundation Model's Assistance
Tianyi Gao, Wei Ao, Xing-Ao Wang, Yuanhao Zhao, Ping Ma, Mengjie Xie, Hang Fu, Jinchang Ren, Zhi Gao
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW, Oral), 2024

We incorporate general VLMs into existing GFSS pipelines through support set augmentation and knowledge distillation, which secured 3rd place in CVPR OpenEarthMap Few-shot Challenge.

Services

Miscellaneous