Learning Ability of Deep ReLU Networks: Pairwise Tasks and Gradient Descent Methods

Zhou, Junyu

Access status:

Open Access

Field	Value	Language
dc.contributor.author	Zhou, Junyu
dc.date.accessioned	2025-08-15T06:45:13Z
dc.date.available	2025-08-15T06:45:13Z
dc.date.issued	2025	en
dc.identifier.uri	https://hdl.handle.net/2123/34229
dc.description.abstract	Deep neural networks (DNNs) have become central to modern machine learning due to their strong empirical performance. However, their theoretical understanding—especially regarding generalization—remains limited. This thesis advances the theory of deep ReLU networks through two lenses: pairwise learning tasks and gradient descent methods. For pairwise learning, we study generalization in non-parametric estimation without relying on restrictive convexity or VC-class assumptions. We establish sharp oracle inequalities for empirical minimizers under general hypothesis spaces and Lipschitz pairwise losses. Applied to pairwise least squares regression, our bounds match known minimax rates up to logarithmic terms. A key innovation is constructing a structured deep ReLU network approximating the true predictor, forming a target hypothesis space with controlled complexity. This framework successfully handles problems beyond the reach of existing theories. For metric and similarity learning, we exploit the structure of the true metric. By deriving its form under hinge loss, we approximate it using structured deep ReLU networks and analyze the excess generalization error by bounding the approximation and the estimation errors. An optimal excess risk rate is achieved, marking the first known such analysis for metric/similarity learning. We also explore extensions to general losses. For gradient descent methods, we study GD and SGD for overparameterized deep ReLU networks in the NTK regime. Prior work mainly covers shallow networks; we fill this gap by establishing the first minimax-optimal generalization rates for GD/SGD with deep architectures. Under polynomial width scaling, our results show these methods can match the generalization performance of kernel approaches.	en
dc.language.iso	en	en
dc.rights	The author retains copyright of this thesis
dc.subject	Deep Learning Theory	en
dc.subject	Deep Neural Networks	en
dc.subject	Generalization Analysis	en
dc.subject	Metric and Similarity Learning	en
dc.subject	Stochastic Gradient Descent	en
dc.title	Learning Ability of Deep ReLU Networks: Pairwise Tasks and Gradient Descent Methods	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::Faculty of Science::School of Mathematics and Statistics	en
usyd.department	Mathematics and Statistics	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	Zhou, Dingxuan