Online Architecture Optimization for Deep Neural Networks
Access status:
USyd Access
Type
ThesisThesis type
Doctor of PhilosophyAuthor/s
Jie, RenlongAbstract
Online data streaming has become one of the most common data forms in the modern world, also people use online approaches to improve the efficiency for model training, which imposes a strong demand of developing hyper-parameter optimization or architecture adaptation techniques for ...
See moreOnline data streaming has become one of the most common data forms in the modern world, also people use online approaches to improve the efficiency for model training, which imposes a strong demand of developing hyper-parameter optimization or architecture adaptation techniques for online learning. The thesis contains four projects on this. The first project is about online parallel hyper-parameter optimization and model training on data streams. A framework called HyperTube is proposed for online hyper-parameter optimization given the limited computing resources. This study also introduces “micro-mini-batch training mechanism” to reuse the online data mini-batches in a relatively efficient way. The second study is online adaptation of activation functions, in which I propose a general combined form of flexible activation functions as well as three principles of choosing flexible activation component. Based on this, two novel flexible activation functions with bounded or unbounded outputs are developed. Also, two new regularisation terms based on assumptions as prior knowledge are proposed. The third study is about online learning rate adaptation, in which I investigate different levels of learning rate adaptation based on the framework of hyper-gradient descent. Based on this, I propose an optimization method that adaptively learns the combination weights for different levels of adaptive learning rates. In the fourth study, I introduce a growing mechanism for differentiable neural architecture search based on network morphism. It enables growing of the cell structures from small size towards large size ones with one-shot training. Two modes can be applied in integrating the growing and original pruning process. Also, a novel two-input backbone architecture is proposed for recurrent neural networks. The proposed methods are well supported by experiments and could contribute to future studies for improving the efficiency of deep learning methods.
See less
See moreOnline data streaming has become one of the most common data forms in the modern world, also people use online approaches to improve the efficiency for model training, which imposes a strong demand of developing hyper-parameter optimization or architecture adaptation techniques for online learning. The thesis contains four projects on this. The first project is about online parallel hyper-parameter optimization and model training on data streams. A framework called HyperTube is proposed for online hyper-parameter optimization given the limited computing resources. This study also introduces “micro-mini-batch training mechanism” to reuse the online data mini-batches in a relatively efficient way. The second study is online adaptation of activation functions, in which I propose a general combined form of flexible activation functions as well as three principles of choosing flexible activation component. Based on this, two novel flexible activation functions with bounded or unbounded outputs are developed. Also, two new regularisation terms based on assumptions as prior knowledge are proposed. The third study is about online learning rate adaptation, in which I investigate different levels of learning rate adaptation based on the framework of hyper-gradient descent. Based on this, I propose an optimization method that adaptively learns the combination weights for different levels of adaptive learning rates. In the fourth study, I introduce a growing mechanism for differentiable neural architecture search based on network morphism. It enables growing of the cell structures from small size towards large size ones with one-shot training. Two modes can be applied in integrating the growing and original pruning process. Also, a novel two-input backbone architecture is proposed for recurrent neural networks. The proposed methods are well supported by experiments and could contribute to future studies for improving the efficiency of deep learning methods.
See less
Date
2020Publisher
University of SydneyRights statement
The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.Faculty/School
The University of Sydney Business School, Discipline of Business AnalyticsAwarding institution
The University of SydneyShare