Skip to content

PRIS presents a large scale off-line handwritten Chinese character database-HCL2000 which will be made public available for the research community. The database contains 3,755 frequently used simplified Chinese characters written by 1,000 different subjects.

Notifications You must be signed in to change notification settings

kaka0910/HCL2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HCL2020: Handwritten Character and Style Recognition

Code and Dataset release for "Handwritten Style Recognition for Chinese Characters on HCL2020 dataset".

Abstract: Structural features of Chinese characters provide abundant style information for handwritten style recognition, while prior work on this task has few senses of using structural information. Meanwhile, based on current handwritten Chinese character datasets, it is hard to obtain a good generalization model only by character category and writer information. Therefore, we add the structural information known as morpheme which is the smallest and unique structure in Chinese character into the large handwritten dataset HCL2000 and update it to HCL2020. We also present a deep fusion network (Morpheme-based Handwritten Style Recognition Network, M-HSRNet), capturing both overall layout characteristics and detail structural features of characters to recognize handwritten style. The evaluation results of the proposed model on HCL2020 are observed to prove the e ectiveness of morpheme. Together with the proposed Morpheme Encoder module, our approach achieves an accuracy of 78.06% in handwritten style recognition, which is 3 points higher than the result without morpheme information.

image

Reconstruction of HCL2020

The criterion of splitting Chinese characters into morphemes based on di erent character structures and some examples. The first split criterion contains: (1) No split based on single-component characters in the form of 'A'. (2) Split into 'A+B' based on multiple-component characters in the form of 'A,B'. (3) Split into 'A+B+C' based on single-component characters in the form of 'A,B,C'. If morphemes after the first split are still more complicated, a second split will be performed which split criterion is same as first split criterion, taking the Chinese character an example. Finally, each morpheme is represented by a morpheme category index.

image

The difference between HCL2000 dataset and HCL2020 dataset. Compared with HCL2000, HCL2020 has more annotations about character structure information.

image

Citation

@inproceedings{hu2020handwritten,
title={Handwritten Style Recognition for Chinese Characters on HCL2020 Dataset},
author={Hu, Peiyi and Xu, Mengqiu and Wu, Ming and Chen, Guang and Zhang, Chuang},
booktitle={Pattern Recognition and Computer Vision: Third Chinese Conference, PRCV 2020, Nanjing, China, October 16--18, 2020, Proceedings, Part II 3},
pages={138--150},
year={2020},
organization={Springer}}

Contact

Thanks for your attention! If you have any suggestion or question, you can leave a message here or contact us directly.

About

PRIS presents a large scale off-line handwritten Chinese character database-HCL2000 which will be made public available for the research community. The database contains 3,755 frequently used simplified Chinese characters written by 1,000 different subjects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages