Learning Theory for Transformers: An Operator-Learning Viewpoint
| Field | Value | Language |
| dc.contributor.author | Liu, Peilin | |
| dc.date.accessioned | 2026-06-15T02:59:11Z | |
| dc.date.available | 2026-06-15T02:59:11Z | |
| dc.date.issued | 2026 | en_AU |
| dc.identifier.uri | https://hdl.handle.net/2123/35412 | |
| dc.description.abstract | Large language models (LLMs) have reshaped the foundations of artificial intelligence research and the modes of interaction between human cognition and machine intelligence. Their influence extends further still, transforming the scientific tools through which we interrogate and model the physical world. Underlying most of these achievements and breakthroughs is a dominant architecture: the Transformer. Although the Transformer was proposed nearly a decade ago, established mathematical frameworks remain insufficient to explain the complex phenomena observed in practice with Transformer-based networks, particularly large language models. This thesis offers a principled theoretical foundation for understanding the remarkable capabilities these models exhibit, grounded in a central argument that the Transformer performs operator learning during pretraining over vast text corpora. Our analysis reveals the nature of pretraining and in-context learning mechanisms of efficient Transformer structures in an operator learning framework. Transformers maps each context distribution to a response function for queries and with more samples from context distribution, they can recover information as much as possible to get a better response function with fixed and pretrained weights without any update. | en_AU |
| dc.language.iso | en | en_AU |
| dc.subject | operator learning | en_AU |
| dc.subject | transformer | en_AU |
| dc.subject | large language model | en_AU |
| dc.subject | statistical learning theory | en_AU |
| dc.subject | approximation | en_AU |
| dc.title | Learning Theory for Transformers: An Operator-Learning Viewpoint | en_AU |
| dc.type | Thesis | |
| dc.type.thesis | Doctor of Philosophy | en_AU |
| dc.rights.other | The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission. | en |
| usyd.faculty | SeS faculties schools::Faculty of Science::School of Mathematics and Statistics | en_AU |
| usyd.degree | Doctor of Philosophy Ph.D. | en_AU |
| usyd.awardinginst | The University of Sydney | en_AU |
| usyd.advisor | Zhou, Dingxuan |
Associated file/s
Associated collections