Transformer-based 3D visual grounding with point clouds for object detection

Wang, Yifan

Permalink

Access status:

USyd Access

Type

Thesis

Thesis type

Masters by Research

Author/s

Wang, Yifan

Abstract

With the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written ...
See moreWith the development of deep learning technologies and large-scale 3D point cloud datasets, 3D visual grounding tasks have become increasingly attractive. Although many recent studies have achieved satisfactory results, most recent 3D visual grounding datasets use human-written descriptions, which can be hard to modify or extend. Additionally, the quality of these descriptions can vary widely. In this paper, we introduce the 3DSSG-Cap dataset, which contains 383,438 descriptions of 27K objects from 1,465 indoor scenes. The descriptions in this dataset are generated using templates, offering flexibility and ease of extension. We also propose a novel method called 3DETRefer to localize the described objects in the 3DSSG-Cap dataset. Our approach incorporates a transformer-based detector and a visual grounding fusion module, enabling accurate object localization and identification.
See less

Date

2024

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Computer Science

Awarding institution

The University of Sydney

Subjects

3D visual grounding
transformer
detection
point clouds