Enabling Plug-and-Play Cameras: Generalisable Methods for Self-Calibration and Multi-Modal Vision Systems

Griffiths, Ryan Ben

Permalink

Access status:

Open Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Griffiths, Ryan Ben

Abstract

Vision systems are foundational to a wide range of real-world applications, including autonomous vehicles navigating complex urban environments, drones performing infrastructure inspection, and robots operating in hazardous or remote settings. These applications increasingly depend ...
See moreVision systems are foundational to a wide range of real-world applications, including autonomous vehicles navigating complex urban environments, drones performing infrastructure inspection, and robots operating in hazardous or remote settings. These applications increasingly depend on diverse and specialised camera hardware, such as fisheye, thermal, and multimodal systems, which challenge the assumptions of conventional computer vision pipelines. Existing approaches typically require labour-intensive calibration, handcrafted adaptation for each camera type, and large labelled datasets. This thesis addresses the central question: how can we build plug-and-play vision systems? We present three key contributions in the areas of self-calibration, network adaptation, and multimodal fusion. First, we introduce NOCaL, a semi-supervised framework that jointly estimates camera intrinsics, distortion, and odometry using a rendering-based self-supervision signal. Second, we propose RectConv, a deformable convolutional layer that enables pretrained convolutional neural networks to operate on previously unseen camera geometries such as fisheye lenses. Third, we develop a transformer-based architecture for multi-camera, multi-modal integration. The model introduces a ray-based rotary positional embedding that enables effective integration of RGB and thermal imagery into a shared, geometrically consistent scene representation. Together, they demonstrate that ray-based, self-supervised representations can support flexible and generalisable vision systems that adapt to new hardware and sensing configurations. The contributions of this thesis have potential impact in domains where robust perception is critical, such as autonomous navigation, environmental monitoring, planetary exploration, and field robotics. This work helps pave the way towards more adaptable, accessible, and intelligent machine perception.
See less

Date

2026

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Aerospace Mechanical and Mechatronic Engineering

Awarding institution

The University of Sydney

Subjects

Cameras
Calibration
Computer Vision
Multi-Modal
Rays