Transformer-based 3D visual grounding with point clouds for object detection
Access status:
USyd Access
Type
ThesisThesis type
Masters by ResearchAuthor/s
Wang, YifanAbstract
With the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written ...
See moreWith the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written descriptions, which can be hard to modify or extend. Additionally, the quality of these descriptions can vary widely. In this paper, we introduce the 3DSSG-Cap dataset, which contains 383,438 descriptions of 27K objects from 1,465 indoor scenes. The descriptions in this dataset are generated using templates, offering flexibility and ease of extension. We also propose a novel method called 3DETRefer to localize the described objects in the 3DSSG-Cap dataset. Our approach incorporates a transformer-based detector and a visual grounding fusion module, enabling accurate object localization and identification.
See less
See moreWith the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written descriptions, which can be hard to modify or extend. Additionally, the quality of these descriptions can vary widely. In this paper, we introduce the 3DSSG-Cap dataset, which contains 383,438 descriptions of 27K objects from 1,465 indoor scenes. The descriptions in this dataset are generated using templates, offering flexibility and ease of extension. We also propose a novel method called 3DETRefer to localize the described objects in the 3DSSG-Cap dataset. Our approach incorporates a transformer-based detector and a visual grounding fusion module, enabling accurate object localization and identification.
See less
Date
2024Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare