Online Architecture Optimization for Deep Neural Networks

Jie, Renlong

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Jie, Renlong
dc.date.accessioned	2021-01-21
dc.date.available	2021-01-21
dc.date.issued	2020	en
dc.identifier.uri	https://hdl.handle.net/2123/24342
dc.description.abstract	Online data streaming has become one of the most common data forms in the modern world, also people use online approaches to improve the efficiency for model training, which imposes a strong demand of developing hyper-parameter optimization or architecture adaptation techniques for online learning. The thesis contains four projects on this. The first project is about online parallel hyper-parameter optimization and model training on data streams. A framework called HyperTube is proposed for online hyper-parameter optimization given the limited computing resources. This study also introduces “micro-mini-batch training mechanism” to reuse the online data mini-batches in a relatively efficient way. The second study is online adaptation of activation functions, in which I propose a general combined form of flexible activation functions as well as three principles of choosing flexible activation component. Based on this, two novel flexible activation functions with bounded or unbounded outputs are developed. Also, two new regularisation terms based on assumptions as prior knowledge are proposed. The third study is about online learning rate adaptation, in which I investigate different levels of learning rate adaptation based on the framework of hyper-gradient descent. Based on this, I propose an optimization method that adaptively learns the combination weights for different levels of adaptive learning rates. In the fourth study, I introduce a growing mechanism for differentiable neural architecture search based on network morphism. It enables growing of the cell structures from small size towards large size ones with one-shot training. Two modes can be applied in integrating the growing and original pruning process. Also, a novel two-input backbone architecture is proposed for recurrent neural networks. The proposed methods are well supported by experiments and could contribute to future studies for improving the efficiency of deep learning methods.	en
dc.language.iso	en	en
dc.publisher	University of Sydney	en
dc.subject	Deep Learning	en
dc.subject	Neural Networks	en
dc.subject	Online Data Streams	en
dc.subject	Hyper-parameter	en
dc.subject	Optimization	en
dc.subject	Neural Architecture Search	en
dc.title	Online Architecture Optimization for Deep Neural Networks	en
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en
usyd.faculty	SeS faculties schools::The University of Sydney Business School::Discipline of Business Analytics	en
usyd.degree	Doctor of Philosophy Ph.D.	en
usyd.awardinginst	The University of Sydney	en
usyd.advisor	GAO, JUNBIN
usyd.advisor	VASNEV, ANDREY