A Unified Multi-Modal Approach for 3D Referring Expression Segmentation
| Field | Value | Language |
| dc.contributor.author | Zhou, Keshen | |
| dc.date.accessioned | 2026-01-22T00:18:08Z | |
| dc.date.available | 2026-01-22T00:18:08Z | |
| dc.date.issued | 2025 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/34745 | |
| dc.description.abstract | Generalised 3D Referring Expression Segmentation (3D-GRES) segments exact 3D objects described by free-form language, even when descriptions match multiple targets, single targets, or zero targets. Most existing methods rely solely on sparse, colour-poor point clouds, neglecting the complementary semantics richness of multi-view RGB images. In this paper, we propose IS-RES, a unified multi-modal framework that integrates RGB images and point clouds for 3D-GRES. Specifically, IS-RES extracts the instance mask by Segment Anything Model(SAM), obtains both dense and instance-aware 2D embeddings through CLIP, and unprojects 2D embeddings into 3D point clouds via confidence-weighted pixel-to-point association. A progressive multi-level fusion strategy is applied to transform fragmented multi-modal features into hierarchical representations, enabling adaptive alignment between instance-level semantics and geometric structures. Extensive experiments demonstrate that IS-RES achieves state-of-the-art performance on both ScanRefer and Multi3DRefer benchmarks, with significant improvements in challenging scenarios involving multiple instances and complex spatial relationships. | en |
| dc.language.iso | en | en |
| dc.rights | The author retains copyright of this thesis | |
| dc.subject | AI | en |
| dc.subject | 3D Segmentation | en |
| dc.subject | Computer Vision | en |
| dc.subject | Multi-modal Learning | en |
| dc.title | A Unified Multi-Modal Approach for 3D Referring Expression Segmentation | en |
| dc.type | Thesis | |
| dc.type.thesis | Masters by Research | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Engineering::School of Computer Science | en |
| usyd.degree | Master of Philosophy M.Phil | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Liu, Tongliang | |
| usyd.include.pub | No | en |
Associated file/s
Associated collections