I am a researcher in Computer Vision and Medical AI. My research focuses on empowering AI systems with the ability of visual reasoning akin to human (experts), with the goal of enhancing both the reliability and usability of these AI models, especially in medicine and healthcare applications. The essential question at the heart of my research is: How do we effectively leverage human insights and expertise knowledge to improve the reliability and usability of AI models?
In addressing this essential question, I focus on harnessing visual and language information to enhance the learning of visual reasoning in intelligence systems. My current research applies this approach to several key areas of visual reasoning, including:
Chain-of-Look (CoL) visual reasoning for triplet recognition in surgical videos and open-set video human-object interaction detection: [Verb-centric Surgical Video Chain-of-Look Reasoning]; [Open-set Chain-of-Look Video Reasoning]
Tree-of-Looks (ToLs) visual reasoning for free-form surgical workflow analysis and surgical/sports action quality assessment: [Tree-of-Looks Spatio-Temporal Context Reasoning]
I obtained Ph.D. in Computer Science and Engineering at University at Buffalo in June 2025, where I am very fortunate to be supervised by Prof. Junsong Yuan.
News
June 2025: One paper is accepted to ACM MM 2025.
February 2025: One paper is accepted to ICCV 2025.
February 2025: One paper is accepted to CVPR 2025.
January 2025: One paper is accepted to WWW 2025.
August 2024:Our research on visual reasoning is awarded by SONY research award program.
July 2024: One paper is accepted to ECCV 2024.
February 2024: I’ll start a resrach internship at Amazon this summer.
July 2023: One paper is accepted to ACM MM 2023.
July 2023: One paper is accepted to ICCV 2023.
June 2023: One paper is accepted to MICCAI 2023.
Contact
Email: nanxi [at] buffalo [dot] edu