Multilingual text-image recognition based on zero real sample learning

Wang, Kewei

Access status:

USyd Access

Field	Value	Language
dc.contributor.author	Wang, Kewei
dc.date.accessioned	2022-09-27T00:29:27Z
dc.date.available	2022-09-27T00:29:27Z
dc.date.issued	2022	en_AU
dc.identifier.uri	https://hdl.handle.net/2123/29579
dc.description.abstract	Scene text recognition (STR), derived from optical character recognition (OCR), has been extensively studied and made marvelous achievements in the past decades. While great progress has been made in majority languages such as Chinese and English, however, for most of the minority languages, the exceptional lack of annotated text databases for training purposes is always exists. Thus, the paper aims to enhance the overall performance of multilingual STR models for minority languages. We strictly choose Japanese as a target minority language and build a novel STR model. For text detection, we utilize an instance segmentation framework, termed PSENet (Progressive Scale Expansion Network), implementing it with a PVT backbone and an FPNF neck. The proposed network reaches an F-measure of 73.51% on the standard ICDAR2017MLT benchmark, also a reasonable result conducted on the ICDAR 2015 benchmark, which indicates the proposed model can successfully tackle complex quadrangular text detection. As for text recognition, we train the recognition network on several different units of backbone, encoder, and decoder to obtain an optimal Japanese STR framework. The proposed recognition network significantly outperforms current methods for text recognition in natural images, achieving a word recognition accuracy of 79.1% and a normalized edit distance of 85.3%. The experiments on standard benchmarks ICDAR2017MLT and a self-collected dataset JPN4val, substantiate the efficacy of the proposed model. This work includes extensive theoretical backing as well as data experience, and it has the potential to serve as an inspiration for future research.	en_AU
dc.language.iso	en	en_AU
dc.subject	Scene Text Recognition	en_AU
dc.subject	Text Detection	en_AU
dc.subject	Instance Segmentation	en_AU
dc.title	Multilingual text-image recognition based on zero real sample learning	en_AU
dc.type	Thesis
dc.type.thesis	Doctor of Philosophy	en_AU
dc.rights.other	The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.	en_AU
usyd.faculty	SeS faculties schools::Faculty of Engineering::School of Computer Science	en_AU
usyd.degree	Master of Philosophy M.Phil	en_AU
usyd.awardinginst	The University of Sydney	en_AU
usyd.advisor	Liu, Tongliang
usyd.include.pub	No	en_AU