πŸ‘€ About me

Hi, I am Erli Zhang. I am a first year PhD student at National University of Singapore πŸ‡ΈπŸ‡¬, supervised by Asst Prof Jin Yueming. My current research interests include AI in Healthcare, Medical/Surgical Video Generation and Surgical Foundation Model.

πŸ”₯ News

  • 2024.10.12: Β πŸŽ‰πŸŽ‰ SurgSAM2 get accpepted by AIM-FM Workshop @ NeurIPS’24!
  • 2024.05.22: Β πŸŽ‰πŸŽ‰ Received PhD offer from National University of Singapore, Department of Biomedical Engineering
  • 2024.02.27: Β πŸŽ‰πŸŽ‰ Q-Instruct get accepted by CVPR2024!
  • 2024.01.16: Β πŸŽ‰πŸŽ‰ Q-Bench get accepted by ICLR2024 (spotlight)!
  • 2023.07.26: Β πŸŽ‰πŸŽ‰ MaxVQA get accepted by ACMMM2023 (oral)!
  • 2023.07.14: Β πŸŽ‰πŸŽ‰ DOVER get accepted by ICCV2023!

πŸ“ Publications

NeurIPS 2024 Workshop
sym

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Haofeng Liu*, Erli Zhang*, Junde Wu*,Mingxuan Hong, Yueming Jin

GitHub , Paper

  • We introduce Surgical SAM 2 (SurgSAM-2), an innovative model that leverages the power of the Segment Anything Model 2 (SAM2), integrating it with an efficient frame pruning mechanism for real-time surgical video segmentation.
  • SurgSAM-2 dramatically reduces memory usage and computational cost of SAM2 for real-time clinical application, achieving superior performance with 3Γ— FPS (86 FPS), and making real-time surgical segmentation in resource-constrained environments a feasible reality.
CVPR 2024
sym

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Haoning Wu*, Zicheng Zhang*, Erli Zhang*, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin

GitHub , Paper

  • We construct the Q-Instruct, the first instruction tuning dataset that focuses on human queries related to low-level vision.
  • We have now supported to run the Q-Instruct demos on your own device! See local demos for instructions. (Now support mplug_owl-2 only)
ICLR 2024
sym

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-Level Vision Haoning Wu*, Zicheng Zhang*, Erli Zhang*, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

GitHub , Paper

  • We construct the Q-Bench, a benchmark to examine the progress of MLLMs on low-level visual abilities. Anticipating these large foundation models to be general-purpose intelligence that can ultimately relieve human efforts, we propose that MLLMs should achieve three important and distinct abilities: perception on low-level visual attributes, language description on low-level visual information, as well as IQA.
  • Submit your model at our project page to compete with existing ones!
ACMMM 2023
sym

Towards Explainable Video Quality Assessment: A Database and a Language-Prompted Approach

Haoning Wu*, Erli Zhang*, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

GitHub , Paper

  • We collect over two million human opinions on 13 dimensions of quality-related factors to establish the multi-dimensional Maxwell database. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies CLIP to better capture important quality issues as observed in our analyses.
ICCV 2023
sym

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Haoning Wu*, Erli Zhang*, Liang Liao*, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

GitHub , Paper

  • The proposed Disentangled Objective Video Quality Evaluator (DOVER) reached state-of-the-art performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.89 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our subjective studies construct the first aesthetic and technical VQA database, the DIVIDE-3k, proving that UGC-VQA is jointly affected by the two perspectives.

πŸ“– Educations

  • 2024.08.01 - current, PhD Student, Major in Biomedical Engineering, National University of Singapore
    • Supervisor: Asst Prof Jin Yueming
    • Research Focus: AI in Healthcare, Medical/Surgical Video Generation and Surgical Foundation Models.
  • 2020.08.10 - 2024.05.30, Undergraduate Student, Major in Computer Science, Nanyang Technological University
    • Specialization: Artificial Intelligence & Data Science
    • Final year project supervised by Prof. Weisi Lin
    • Research Topic: Explainable Visual Quality Assessments.
  • 2021.08.10 - 2021.12.01, SUSEP Exchange Student, National University of Singapore

πŸŽ– Honors and Awards

  • 2022.7 CFAR Internship Award for Research Excellence
  • 2019.6 NTU Science and Engineering Undergraduate Scholarship

πŸ’» Internships and Projects

  • July 2023-Present, Center for Cognition, Vision, and Learning, Johns Hopkins University, Research Student
    • Supervisor: Prof Alan L. Yuille
    • Evaluate how the robustness of a sequential learning model changes with every new task relative to jointly trained neural models
    • Adapt current robustness methods to continual learning setups and analyse whether they improve model robustness when learning continually
  • May 2023-July 2023, Sunstella Foundation, Summer Research Scholar
    • Supervisor: Prof Jimeng Sun
    • Worked on MedBind, an AI model combining multiple modalities to generate synthetic patient records to enhance clinical research
    • Contributed to PyHealth, a comprehensive deep learning toolkit for supporting clinical predictive modelling
  • July 2022-May 2023, Institute for Infocomm Research, AI Research Engineer
    • Supervisor: Dr Weimin Huang
    • Conducted insightful research into the field of medical image processing, specifically in mammogram analysis
    • Developed a model using weakly semi-supervised learning and transformers to predict breast cancer risk at multiple time points based on traditional mammograms and common risk factors and clinical data
  • July 2021-May 2022, Undergraduate Research Experience on Campus, Nanyang Technological University, URECA Research Student
    • Supervisor: Prof Weisi Lin
    • Identified common factors that lead to bias in facial analysis, e.g., occlusions, pose variation, expressions, etc.
    • Evaluated current state-of-the-art face recognition methods on various datasets with bias
    • Compared common feature detection and description techniques in occluded datasets

πŸ“¬ Contact Me

  • Email: zhangerlicarl@gmail.com or erli.z@nus.edu.sg
  • Twitter: @zhang_erli