Deep Learning Based Novel View Synthesis
Access status:
USyd Access
Type
ThesisThesis type
Masters by ResearchAuthor/s
Xie, KeAbstract
The synthesis of high-fidelity, large-scale urban scenes is central to autonomous driving simulation and validation. Explicit radiance fields, especially 3D Gaussian Splatting (3DGS), now dominate this area thanks to their realism and real-time rendering, yet they struggle in dynamic ...
See moreThe synthesis of high-fidelity, large-scale urban scenes is central to autonomous driving simulation and validation. Explicit radiance fields, especially 3D Gaussian Splatting (3DGS), now dominate this area thanks to their realism and real-time rendering, yet they struggle in dynamic street environments: far-field backgrounds degrade into structural blur, color noise, and temporal flickering because sparse photometric cues under-constrain distant geometry. We propose SeeDepthGaussian, a 3DGS-based framework that injects monocular depth estimation, a readily available signal in autonomous driving, as a geometric prior to regularize the spatial layout of 3D Gaussian primitives. All depth-related constraints are applied only during training, improving geometry and visual fidelity without adding inference-time cost, thus preserving the real-time nature of 3DGS. SeeDepthGaussian makes three key contributions. First, a dual depth regularization scheme for dynamic scenes: a rigid term enforces sharp, occlusion-aware surfaces, while a flexible term refines semi-transparent and thin structures via opacity-weighted depth. Second, a Global–Local Depth Normalization strategy combines local patch normalization, which is sensitive to fine details, with global normalization that maintains scene-wide metric consistency. Third, we replace spherical harmonics with a neural background renderer, which better captures complex textures and view-dependent lighting while reducing overfitting. Experiments on the challenging Waymo Open Dataset show consistent gains over strong baselines on all major metrics, including a 0.21 dB improvement in PSNR. Qualitative results confirm clearer and more stable distant views with suppressed blur and color noise. Overall, SeeDepthGaussian demonstrates that depth-aware training is an effective route to high-fidelity reconstruction of large-scale, dynamic urban scenes.
See less
See moreThe synthesis of high-fidelity, large-scale urban scenes is central to autonomous driving simulation and validation. Explicit radiance fields, especially 3D Gaussian Splatting (3DGS), now dominate this area thanks to their realism and real-time rendering, yet they struggle in dynamic street environments: far-field backgrounds degrade into structural blur, color noise, and temporal flickering because sparse photometric cues under-constrain distant geometry. We propose SeeDepthGaussian, a 3DGS-based framework that injects monocular depth estimation, a readily available signal in autonomous driving, as a geometric prior to regularize the spatial layout of 3D Gaussian primitives. All depth-related constraints are applied only during training, improving geometry and visual fidelity without adding inference-time cost, thus preserving the real-time nature of 3DGS. SeeDepthGaussian makes three key contributions. First, a dual depth regularization scheme for dynamic scenes: a rigid term enforces sharp, occlusion-aware surfaces, while a flexible term refines semi-transparent and thin structures via opacity-weighted depth. Second, a Global–Local Depth Normalization strategy combines local patch normalization, which is sensitive to fine details, with global normalization that maintains scene-wide metric consistency. Third, we replace spherical harmonics with a neural background renderer, which better captures complex textures and view-dependent lighting while reducing overfitting. Experiments on the challenging Waymo Open Dataset show consistent gains over strong baselines on all major metrics, including a 0.21 dB improvement in PSNR. Qualitative results confirm clearer and more stable distant views with suppressed blur and color noise. Overall, SeeDepthGaussian demonstrates that depth-aware training is an effective route to high-fidelity reconstruction of large-scale, dynamic urban scenes.
See less
Date
2025Rights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
Faculty of Engineering, School of Computer ScienceAwarding institution
The University of SydneyShare