Towards Surgical Intelligence with Deep Learning-Based Surgical Video Analysis
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Yue, WenxiAbstract
Advancements in medicine and information technology have revolutionised surgery, with computer-assisted procedures integrating advanced computer technology to aid interventions. This thesis presents interdisciplinary studies developing state-of-the-art deep learning methods for ...
See moreAdvancements in medicine and information technology have revolutionised surgery, with computer-assisted procedures integrating advanced computer technology to aid interventions. This thesis presents interdisciplinary studies developing state-of-the-art deep learning methods for surgical video analysis, focusing on two key aspects of surgery: temporal dynamics and spatial comprehension, by two pivotal tasks: surgical workflow analysis and instrument segmentation. First, we focus on surgical workflow analysis and observe that existing methods extract temporal context solely at the frame level and aggregate homogeneous contextual information for all frames. To address this problem, we propose a Cascaded Multi-Level Transformer Network that extracts both frame-level and phase-level temporal context and fuses them with spatial features in a frame-adaptive manner, thereby improving performance. Next, with the emergence of foundation models, we explore the adaptation of the Segment Anything Model (SAM) to the surgical domain for instrument segmentation. We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM’s pre-trained knowledge for improved generalisation. Moreover, we explore the domain gap between natural objects and surgical instruments, recognising that the critical distinction lies in the complex structures and fine-grained details of various surgical instruments. To address this challenge, we propose SurgicalPart-SAM, which explicitly integrates surgical instrument structure knowledge to improve the understanding and differentiation of instrument categories. Through these contributions, we advance state-of-the-art deep learning methods for surgical video analysis, enhancing performance while reducing development costs. These advancements lead to improved accuracy in computer-assisted surgical systems and greater accessibility of surgical technology for healthcare institutions.
See less
See moreAdvancements in medicine and information technology have revolutionised surgery, with computer-assisted procedures integrating advanced computer technology to aid interventions. This thesis presents interdisciplinary studies developing state-of-the-art deep learning methods for surgical video analysis, focusing on two key aspects of surgery: temporal dynamics and spatial comprehension, by two pivotal tasks: surgical workflow analysis and instrument segmentation. First, we focus on surgical workflow analysis and observe that existing methods extract temporal context solely at the frame level and aggregate homogeneous contextual information for all frames. To address this problem, we propose a Cascaded Multi-Level Transformer Network that extracts both frame-level and phase-level temporal context and fuses them with spatial features in a frame-adaptive manner, thereby improving performance. Next, with the emergence of foundation models, we explore the adaptation of the Segment Anything Model (SAM) to the surgical domain for instrument segmentation. We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM’s pre-trained knowledge for improved generalisation. Moreover, we explore the domain gap between natural objects and surgical instruments, recognising that the critical distinction lies in the complex structures and fine-grained details of various surgical instruments. To address this challenge, we propose SurgicalPart-SAM, which explicitly integrates surgical instrument structure knowledge to improve the understanding and differentiation of instrument categories. Through these contributions, we advance state-of-the-art deep learning methods for surgical video analysis, enhancing performance while reducing development costs. These advancements lead to improved accuracy in computer-assisted surgical systems and greater accessibility of surgical technology for healthcare institutions.
See less
Date
2024Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare