Transformer-based 3D visual grounding with point clouds for object detection

Wang, Yifan

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Wang, Yifan
dc.date.accessioned	2024-03-05T21:31:09Z
dc.date.available	2024-03-05T21:31:09Z
dc.date.issued	2024	en
dc.identifier.uri	https://hdl.handle.net/2123/32309
dc.description	Includes publication
dc.description.abstract	With the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written descriptions, which can be hard to modify or extend. Additionally, the quality of these descriptions can vary widely. In this paper, we introduce the 3DSSG-Cap dataset, which contains 383,438 descriptions of 27K objects from 1,465 indoor scenes. The descriptions in this dataset are generated using templates, offering flexibility and ease of extension. We also propose a novel method called 3DETRefer to localize the described objects in the 3DSSG-Cap dataset. Our approach incorporates a transformer-based detector and a visual grounding fusion module, enabling accurate object localization and identification.	en
dc.language.iso	en	en
dc.subject	3D visual grounding	en
dc.subject	transformer	en
dc.subject	detection	en
dc.subject	point clouds	en
dc.title	Transformer-based 3D visual grounding with point clouds for object detection	en
dc.type	Thesis
dc.type.thesis	Masters by Research	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en
usyd.degree	Master of Philosophy M.Phil	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Cai, Weidong
usyd.include.pub	Yes	en