Enabling Plug-and-Play Cameras: Generalisable Methods for Self-Calibration and Multi-Modal Vision Systems
Access status:
Open Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Griffiths, Ryan BenAbstract
Vision systems are foundational to a wide range of real-world applications, including autonomous vehicles navigating complex urban environments, drones performing infrastructure inspection, and robots operating in hazardous or remote settings. These applications increasingly depend ...
See moreVision systems are foundational to a wide range of real-world applications, including autonomous vehicles navigating complex urban environments, drones performing infrastructure inspection, and robots operating in hazardous or remote settings. These applications increasingly depend on diverse and specialised camera hardware, such as fisheye, thermal, and multimodal systems, which challenge the assumptions of conventional computer vision pipelines. Existing approaches typically require labour-intensive calibration, handcrafted adaptation for each camera type, and large labelled datasets. This thesis addresses the central question: how can we build plug-and-play vision systems? We present three key contributions in the areas of self-calibration, network adaptation, and multimodal fusion. First, we introduce NOCaL, a semi-supervised framework that jointly estimates camera intrinsics, distortion, and odometry using a rendering-based self-supervision signal. Second, we propose RectConv, a deformable convolutional layer that enables pretrained convolutional neural networks to operate on previously unseen camera geometries such as fisheye lenses. Third, we develop a transformer-based architecture for multi-camera, multi-modal integration. The model introduces a ray-based rotary positional embedding that enables effective integration of RGB and thermal imagery into a shared, geometrically consistent scene representation. Together, they demonstrate that ray-based, self-supervised representations can support flexible and generalisable vision systems that adapt to new hardware and sensing configurations. The contributions of this thesis have potential impact in domains where robust perception is critical, such as autonomous navigation, environmental monitoring, planetary exploration, and field robotics. This work helps pave the way towards more adaptable, accessible, and intelligent machine perception.
See less
See moreVision systems are foundational to a wide range of real-world applications, including autonomous vehicles navigating complex urban environments, drones performing infrastructure inspection, and robots operating in hazardous or remote settings. These applications increasingly depend on diverse and specialised camera hardware, such as fisheye, thermal, and multimodal systems, which challenge the assumptions of conventional computer vision pipelines. Existing approaches typically require labour-intensive calibration, handcrafted adaptation for each camera type, and large labelled datasets. This thesis addresses the central question: how can we build plug-and-play vision systems? We present three key contributions in the areas of self-calibration, network adaptation, and multimodal fusion. First, we introduce NOCaL, a semi-supervised framework that jointly estimates camera intrinsics, distortion, and odometry using a rendering-based self-supervision signal. Second, we propose RectConv, a deformable convolutional layer that enables pretrained convolutional neural networks to operate on previously unseen camera geometries such as fisheye lenses. Third, we develop a transformer-based architecture for multi-camera, multi-modal integration. The model introduces a ray-based rotary positional embedding that enables effective integration of RGB and thermal imagery into a shared, geometrically consistent scene representation. Together, they demonstrate that ray-based, self-supervised representations can support flexible and generalisable vision systems that adapt to new hardware and sensing configurations. The contributions of this thesis have potential impact in domains where robust perception is critical, such as autonomous navigation, environmental monitoring, planetary exploration, and field robotics. This work helps pave the way towards more adaptable, accessible, and intelligent machine perception.
See less
Date
2026Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Aerospace Mechanical and Mechatronic EngineeringAwarding institution
The University of SydneyShare