Show simple item record

FieldValueLanguage
dc.contributor.authorWang, Yifan
dc.date.accessioned2024-03-05T21:31:09Z
dc.date.available2024-03-05T21:31:09Z
dc.date.issued2024en
dc.identifier.urihttps://hdl.handle.net/2123/32309
dc.descriptionIncludes publication
dc.description.abstractWith the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written descriptions, which can be hard to modify or extend. Additionally, the quality of these descriptions can vary widely. In this paper, we introduce the 3DSSG-Cap dataset, which contains 383,438 descriptions of 27K objects from 1,465 indoor scenes. The descriptions in this dataset are generated using templates, offering flexibility and ease of extension. We also propose a novel method called 3DETRefer to localize the described objects in the 3DSSG-Cap dataset. Our approach incorporates a transformer-based detector and a visual grounding fusion module, enabling accurate object localization and identification.en
dc.language.isoenen
dc.subject3D visual groundingen
dc.subjecttransformeren
dc.subjectdetectionen
dc.subjectpoint cloudsen
dc.titleTransformer-based 3D visual grounding with point clouds for object detectionen
dc.typeThesis
dc.type.thesisMasters by Researchen
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen
usyd.degreeMaster of Philosophy M.Philen
usyd.awardinginstThe University of Sydneyen
usyd.advisorCai, Weidong
usyd.include.pubYesen


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.