Multilingual text-image recognition based on zero real sample learning

Wang, Kewei

Permalink

Access status:

USyd Access

Type

Thesis

Thesis type

Doctor of Philosophy

Author/s

Wang, Kewei

Abstract

Scene text recognition (STR), derived from optical character recognition (OCR), has been extensively studied and made marvelous achievements in the past decades. While great progress has been made in majority languages such as Chinese and English, however, for most of the minority ...
See moreScene text recognition (STR), derived from optical character recognition (OCR), has been extensively studied and made marvelous achievements in the past decades. While great progress has been made in majority languages such as Chinese and English, however, for most of the minority languages, the exceptional lack of annotated text databases for training purposes is always exists. Thus, the paper aims to enhance the overall performance of multilingual STR models for minority languages. We strictly choose Japanese as a target minority language and build a novel STR model. For text detection, we utilize an instance segmentation framework, termed PSENet (Progressive Scale Expansion Network), implementing it with a PVT backbone and an FPNF neck. The proposed network reaches an F-measure of 73.51% on the standard ICDAR2017MLT benchmark, also a reasonable result conducted on the ICDAR 2015 benchmark, which indicates the proposed model can successfully tackle complex quadrangular text detection. As for text recognition, we train the recognition network on several different units of backbone, encoder, and decoder to obtain an optimal Japanese STR framework. The proposed recognition network significantly outperforms current methods for text recognition in natural images, achieving a word recognition accuracy of 79.1% and a normalized edit distance of 85.3%. The experiments on standard benchmarks ICDAR2017MLT and a self-collected dataset JPN4val, substantiate the efficacy of the proposed model. This work includes extensive theoretical backing as well as data experience, and it has the potential to serve as an inspiration for future research.
See less

Date

2022

Rights statement

The author retains copyright of this thesis. It may only be used for the purposes of research and study. It must not be used for any other purposes and may not be transmitted or shared with others without prior permission.

Faculty/School

Faculty of Engineering, School of Computer Science

Awarding institution

The University of Sydney

Subjects

Scene Text Recognition
Text Detection
Instance Segmentation