Show simple item record

FieldValueLanguage
dc.contributor.authorWang, Kewei
dc.date.accessioned2022-09-27T00:29:27Z
dc.date.available2022-09-27T00:29:27Z
dc.date.issued2022en_AU
dc.identifier.urihttps://hdl.handle.net/2123/29579
dc.description.abstractScene text recognition (STR), derived from optical character recognition (OCR), has been extensively studied and made marvelous achievements in the past decades. While great progress has been made in majority languages such as Chinese and English, however, for most of the minority languages, the exceptional lack of annotated text databases for training purposes is always exists. Thus, the paper aims to enhance the overall performance of multilingual STR models for minority languages. We strictly choose Japanese as a target minority language and build a novel STR model. For text detection, we utilize an instance segmentation framework, termed PSENet (Progressive Scale Expansion Network), implementing it with a PVT backbone and an FPNF neck. The proposed network reaches an F-measure of 73.51% on the standard ICDAR2017MLT benchmark, also a reasonable result conducted on the ICDAR 2015 benchmark, which indicates the proposed model can successfully tackle complex quadrangular text detection. As for text recognition, we train the recognition network on several different units of backbone, encoder, and decoder to obtain an optimal Japanese STR framework. The proposed recognition network significantly outperforms current methods for text recognition in natural images, achieving a word recognition accuracy of 79.1% and a normalized edit distance of 85.3%. The experiments on standard benchmarks ICDAR2017MLT and a self-collected dataset JPN4val, substantiate the efficacy of the proposed model. This work includes extensive theoretical backing as well as data experience, and it has the potential to serve as an inspiration for future research.en_AU
dc.language.isoenen_AU
dc.subjectScene Text Recognitionen_AU
dc.subjectText Detectionen_AU
dc.subjectInstance Segmentationen_AU
dc.titleMultilingual text-image recognition based on zero real sample learningen_AU
dc.typeThesis
dc.type.thesisDoctor of Philosophyen_AU
dc.rights.otherThe author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.en_AU
usyd.facultySeS faculties schools::Faculty of Engineering::School of Computer Scienceen_AU
usyd.degreeMaster of Philosophy M.Philen_AU
usyd.awardinginstThe University of Sydneyen_AU
usyd.advisorLiu, Tongliang
usyd.include.pubNoen_AU


Show simple item record

Associated file/s

Associated collections

Show simple item record

There are no previous versions of the item available.