Learning Ability of Deep ReLU Networks: Pairwise Tasks and Gradient Descent Methods
| Field | Value | Language |
| dc.contributor.author | Zhou, Junyu | |
| dc.date.accessioned | 2025-08-15T06:45:13Z | |
| dc.date.available | 2025-08-15T06:45:13Z | |
| dc.date.issued | 2025 | en |
| dc.identifier.uri | https://hdl.handle.net/2123/34229 | |
| dc.description.abstract | Deep neural networks (DNNs) have become central to modern machine learning due to their strong empirical performance. However, their theoretical understanding—especially regarding generalization—remains limited. This thesis advances the theory of deep ReLU networks through two lenses: pairwise learning tasks and gradient descent methods. For pairwise learning, we study generalization in non-parametric estimation without relying on restrictive convexity or VC-class assumptions. We establish sharp oracle inequalities for empirical minimizers under general hypothesis spaces and Lipschitz pairwise losses. Applied to pairwise least squares regression, our bounds match known minimax rates up to logarithmic terms. A key innovation is constructing a structured deep ReLU network approximating the true predictor, forming a target hypothesis space with controlled complexity. This framework successfully handles problems beyond the reach of existing theories. For metric and similarity learning, we exploit the structure of the true metric. By deriving its form under hinge loss, we approximate it using structured deep ReLU networks and analyze the excess generalization error by bounding the approximation and the estimation errors. An optimal excess risk rate is achieved, marking the first known such analysis for metric/similarity learning. We also explore extensions to general losses. For gradient descent methods, we study GD and SGD for overparameterized deep ReLU networks in the NTK regime. Prior work mainly covers shallow networks; we fill this gap by establishing the first minimax-optimal generalization rates for GD/SGD with deep architectures. Under polynomial width scaling, our results show these methods can match the generalization performance of kernel approaches. | en |
| dc.language.iso | en | en |
| dc.rights | The author retains copyright of this thesis | |
| dc.subject | Deep Learning Theory | en |
| dc.subject | Deep Neural Networks | en |
| dc.subject | Generalization Analysis | en |
| dc.subject | Metric and Similarity Learning | en |
| dc.subject | Stochastic Gradient Descent | en |
| dc.title | Learning Ability of Deep ReLU Networks: Pairwise Tasks and Gradient Descent Methods | en |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Science::School of Mathematics and Statistics | en |
| usyd.department | Mathematics and Statistics | en |
| usyd.degree | Doctor of Philosophy Ph.D. | en |
| usyd.awardinginst | The University of Sydney | en |
| usyd.advisor | Zhou, Dingxuan |
Associated file/s
Associated collections