π€ About me
Hi, I am Erli Zhang. I am a first year PhD student at National University of Singapore πΈπ¬, supervised by Asst Prof Jin Yueming. My current research interests include AI in Healthcare, Medical/Surgical Video Generation and Surgical Foundation Model.
π₯ News
- 2024.10.12: Β ππ SurgSAM2 get accpepted by AIM-FM Workshop @ NeurIPSβ24!
- 2024.05.22: Β ππ Received PhD offer from National University of Singapore, Department of Biomedical Engineering
- 2024.02.27: Β ππ Q-Instruct get accepted by CVPR2024!
- 2024.01.16: Β ππ Q-Bench get accepted by ICLR2024 (spotlight)!
- 2023.07.26: Β ππ MaxVQA get accepted by ACMMM2023 (oral)!
- 2023.07.14: Β ππ DOVER get accepted by ICCV2023!
π Publications

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Haofeng Liu*, Erli Zhang*, Junde Wu*,Mingxuan Hong, Yueming Jin
- We introduce Surgical SAM 2 (SurgSAM-2), an innovative model that leverages the power of the Segment Anything Model 2 (SAM2), integrating it with an efficient frame pruning mechanism for real-time surgical video segmentation.
- SurgSAM-2 dramatically reduces memory usage and computational cost of SAM2 for real-time clinical application, achieving superior performance with 3Γ FPS (86 FPS), and making real-time surgical segmentation in resource-constrained environments a feasible reality.

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu*, Zicheng Zhang*, Erli Zhang*, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin
- We construct the Q-Instruct, the first instruction tuning dataset that focuses on human queries related to low-level vision.
- We have now supported to run the Q-Instruct demos on your own device! See local demos for instructions. (Now support mplug_owl-2 only)

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-Level Vision Haoning Wu*, Zicheng Zhang*, Erli Zhang*, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin
- We construct the Q-Bench, a benchmark to examine the progress of MLLMs on low-level visual abilities. Anticipating these large foundation models to be general-purpose intelligence that can ultimately relieve human efforts, we propose that MLLMs should achieve three important and distinct abilities: perception on low-level visual attributes, language description on low-level visual information, as well as IQA.
- Submit your model at our project page to compete with existing ones!

Towards Explainable Video Quality Assessment: A Database and a Language-Prompted Approach
Haoning Wu*, Erli Zhang*, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
- We collect over two million human opinions on 13 dimensions of quality-related factors to establish the multi-dimensional Maxwell database. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies CLIP to better capture important quality issues as observed in our analyses.

Haoning Wu*, Erli Zhang*, Liang Liao*, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
- The proposed Disentangled Objective Video Quality Evaluator (DOVER) reached state-of-the-art performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.89 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our subjective studies construct the first aesthetic and technical VQA database, the DIVIDE-3k, proving that UGC-VQA is jointly affected by the two perspectives.
π Educations
- 2024.08.01 - current, PhD Student, Major in Biomedical Engineering, National University of Singapore
- Supervisor: Asst Prof Jin Yueming
- Research Focus: AI in Healthcare, Medical/Surgical Video Generation and Surgical Foundation Models.
- 2020.08.10 - 2024.05.30, Undergraduate Student, Major in Computer Science, Nanyang Technological University
- Specialization: Artificial Intelligence & Data Science
- Final year project supervised by Prof. Weisi Lin
- Research Topic: Explainable Visual Quality Assessments.
- 2021.08.10 - 2021.12.01, SUSEP Exchange Student, National University of Singapore
π Honors and Awards
- 2022.7 CFAR Internship Award for Research Excellence
- 2019.6 NTU Science and Engineering Undergraduate Scholarship
π» Internships and Projects
- July 2023-Present, Center for Cognition, Vision, and Learning, Johns Hopkins University, Research Student
- Supervisor: Prof Alan L. Yuille
- Evaluate how the robustness of a sequential learning model changes with every new task relative to jointly trained neural models
- Adapt current robustness methods to continual learning setups and analyse whether they improve model robustness when learning continually
- May 2023-July 2023, Sunstella Foundation, Summer Research Scholar
- Supervisor: Prof Jimeng Sun
- Worked on MedBind, an AI model combining multiple modalities to generate synthetic patient records to enhance clinical research
- Contributed to PyHealth, a comprehensive deep learning toolkit for supporting clinical predictive modelling
- July 2022-May 2023, Institute for Infocomm Research, AI Research Engineer
- Supervisor: Dr Weimin Huang
- Conducted insightful research into the field of medical image processing, specifically in mammogram analysis
- Developed a model using weakly semi-supervised learning and transformers to predict breast cancer risk at multiple time points based on traditional mammograms and common risk factors and clinical data
- July 2021-May 2022, Undergraduate Research Experience on Campus, Nanyang Technological University, URECA Research Student
- Supervisor: Prof Weisi Lin
- Identified common factors that lead to bias in facial analysis, e.g., occlusions, pose variation, expressions, etc.
- Evaluated current state-of-the-art face recognition methods on various datasets with bias
- Compared common feature detection and description techniques in occluded datasets
π¬ Contact Me
- Email: zhangerlicarl@gmail.com or erli.z@nus.edu.sg
- Twitter: @zhang_erli