Light Field Based 6DoF Tracking of Previously Unobserved Objects
Access status:
Open Access
Type
DatasetAbstract
Object tracking is an important step in robotics and autonomous driving pipelines, which has to generalize to previously unseen and complex objects. Existing high-performing methods often rely on pre-captured object views to build explicit reference models, which restricts them to ...
See moreObject tracking is an important step in robotics and autonomous driving pipelines, which has to generalize to previously unseen and complex objects. Existing high-performing methods often rely on pre-captured object views to build explicit reference models, which restricts them to a fixed set of known objects. However, such reference models can struggle with visually complex appearance, reducing the quality of tracking. In this work, we introduce an object tracking method based on light field images that does not depend on a pre-trained model, while being robust to complex visual behavior, such as reflections. We extract semantic and geometric features from light field inputs using vision foundation models and convert them into view-dependent Gaussian splats. These splats serve as a unified object representation supporting differentiable rendering and pose optimization. We further introduce a light field object tracking dataset containing challenging reflective objects with precise ground truth poses. Experiments demonstrate that our method is competitive with state-of-the-art model-based trackers in these difficult cases, paving the way toward universal object tracking in robotic systems. Each sequence contains the following directory structure: ├── camera_matrix.txt ├── camera_poses ├── depth ├── depth_video.gif ├── gdino_prompt.txt ├── LF_0000 ... ├── LF_XXXX ├── metadata.json ├── object_poses └── video.gif Each frame in the sequence has its own folder and the light field sub-aperture views are contained within. The depth maps are contained within the depth folder. The camera and object poses are contained within the camera_poses and object_poses directories respectively.
See less
See moreObject tracking is an important step in robotics and autonomous driving pipelines, which has to generalize to previously unseen and complex objects. Existing high-performing methods often rely on pre-captured object views to build explicit reference models, which restricts them to a fixed set of known objects. However, such reference models can struggle with visually complex appearance, reducing the quality of tracking. In this work, we introduce an object tracking method based on light field images that does not depend on a pre-trained model, while being robust to complex visual behavior, such as reflections. We extract semantic and geometric features from light field inputs using vision foundation models and convert them into view-dependent Gaussian splats. These splats serve as a unified object representation supporting differentiable rendering and pose optimization. We further introduce a light field object tracking dataset containing challenging reflective objects with precise ground truth poses. Experiments demonstrate that our method is competitive with state-of-the-art model-based trackers in these difficult cases, paving the way toward universal object tracking in robotic systems. Each sequence contains the following directory structure: ├── camera_matrix.txt ├── camera_poses ├── depth ├── depth_video.gif ├── gdino_prompt.txt ├── LF_0000 ... ├── LF_XXXX ├── metadata.json ├── object_poses └── video.gif Each frame in the sequence has its own folder and the light field sub-aperture views are contained within. The depth maps are contained within the depth folder. The camera and object poses are contained within the camera_poses and object_poses directories respectively.
See less
Date
2025-12-16Licence
Creative Commons Attribution 4.0Faculty/School
Faculty of Engineering, School of Aerospace Mechanical and Mechatronic EngineeringDepartment, Discipline or Centre
Australian Centre for RoboticsShare