This is a reading list for deep learning for OCR.
- Deng, D., Liu, H., Li, X., & Cai, D. (2018). PixelLink: Detecting Scene Text via Instance Segmentation. arXiv preprint arXiv:1801.01315. [pdf] [code]
-
Detecting Oriented Text in Natural Images by Linking Segments, Shi, B., Bai, X., & Belongie, S. (2017, July). CVPR (pp. 3482-3490). IEEE. [pdf] [code]
-
Gated Recurrent Convolution Neural Network for OCR [pdf] [code] Wang, J., & Hu, X. (2017). NIPS (pp. 335-344).
-
TextBoxes: A Fast Text Detector with a Single Deep Neural Network, Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017, February). AAAI (pp. 4161-4167).[pdf] [code]
-
Deep Direct Regression for Multi-Oriented Scene Text Detection, He, W., Zhang, X. Y., Yin, F., & Liu, C. L. (2017). ICCV (pp. 745-753). [pdf]
-
An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition [pdf] [code] Shi, B., Bai, X., & Yao, C. (2017). TPAMI, 39(11), 2298-2304.
-
EAST: an efficient and accurate scene text detector, Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017, July). CVPR (pp. 2642-2651).[pdf] [code]
-
Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention, Theodore Bluche Jerome Louradour, Ronaldo Messina, ICDAR, 2017. pdf
-
Combining Convolutional Neural Networks and LSTMs for Segmentation-Free OCR, Rawls, S., Cao, H., Kumar, S., & Natarajan, P. (2017, November). ICDAR, (Vol. 1, pp. 155-160). IEEE. [pdf]
-
Combining deep learning and language modeling for segmentation-free OCR from raw pixels, Rawls, S., Cao, H., Sabir, E., & Natarajan, P. (2017, April) ASAR, (pp. 119-123). IEEE. [pdf]
-
Implicit Language Model in LSTM for OCR, Sabir, E., Rawls, S., & Natarajan, P. (2017, November). ICDAR, (Vol. 7, pp. 27-31). IEEE.[pdf]
-
Towards end-to-end text spotting with convolutional recurrent neural networks, Li, H., Wang, P., & Shen, C. (2017, October). ICCV (pp. 5238-5246). [pdf]
-
Deep matching prior network: Toward tighter multi-oriented text detection, Liu, Y., & Jin, L. (2017, March). CVPR (pp. 3454-3461). [pdf]
-
Multi-oriented text detection with fully convolutional networks, Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). CVPR (pp. 4159-4167). [pdf]
-
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham. [pdf] [code]
-
Deeptext: A unified framework for text proposal generation and text detection in natural images, Zhong, Z., Jin, L., Zhang, S., & Feng, Z. (2016). arXiv preprint arXiv:1605.07314. [pdf]
-
Synthetic data for text localisation in natural images, Gupta, A., Vedaldi, A., & Zisserman, A. (2016). CVPR (pp. 2315-2324). [pdf]
-
Reading text in the wild with convolutional neural networks (2016), M. Jaderberg et al. IJCV, (DeepMind) [pdf] [code]
-
Detecting text in natural image with connectionist text proposal network, Tian, Z., Huang, W., He, T., He, P., & Qiao, Y. (2016, October). ECCV (pp. 56-72). Springer, Cham. [pdf]
-
Recursive Recurrent Nets with Attention Modeling for OCR in the Wild, Chen-Yu Lee, Simon Osindero, CVPR, 2016, pdf
-
Reading Scene Text in Deep Convolutional Sequences, Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, and Xiaoou Tang, AAAI, 2016, pdf
-
You only look once: Unified, real-time object detection, Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). CVPR (pp. 779-788). [pdf] [code]
-
Scene text detection via holistic, multi-channel prediction, Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., & Cao, Z. (2016). arXiv preprint arXiv:1606.09002. [pdf]
-
Unsupervised feature learning for optical character recognition, Sahu, D. K., & Jawahar, C. V. (2015, August). ICDAR (pp. 1041-1045). IEEE. [pdf]
-
Sequence to sequence learning for optical character recognition, Devendra Kumar Sahu & Mohak Sukhwani, 2015, pdf
-
ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks, Francesco Visin, Kyle Kastner,Kyunghyun Cho, Matteo Matteucci,Aaron Courville, Yoshua Bengio. pdf
-
Faster r-cnn: Towards real-time object detection with region proposal networks, Ren, S., He, K., Girshick, R., & Sun, J. (2015). NIPS (pp. 91-99). [pdf] [code]
-
Symmetry-based text line detection in natural scenes, Zhang, Z., Shen, W., Yao, C., & Bai, X. (2015). CVPR (pp. 2558-2567). [pdf]
-
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). CVPR (pp. 580-587). [pdf] [code]
-
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition, Theodore Bluche, Hermann Ney, and Christopher Kermorvant, SLSP, 2014. pdf
-
Orientation robust text line detection in natural images, Kang, L., Li, Y., & Doermann, D. (2014). CVPR (pp. 4034-4041). [pdf]
-
Towards End-to-End Speech Recognition with Recurrent Neural Networks. Alex Graves, Navdeep Jaitly. ICML, 2014. pdf
-
Multi-digit Number Recognition from Street View, Imagery using Deep Convolutional Neural Networks, Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, Vinay Shet, ICLR, 2014. pdf
-
Deep Features for Text Spotting, ECCV, M. Jaderberg, A. Vedaldi, A. Zisserman, 2014. pdf, code
-
PhotoOCR: Reading Text in Uncontrolled Conditions, Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven, ICCV, 2013. pdf
-
High Performance OCR for Printed English and Fraktur using LSTM Networks. ICDAR, 2013. pdf
-
Image binarization for end-to-end text understanding in natural images, Sergey Milyaev, Olga Barinova, Tatiana Novikova, Pushmeet Kohli, Victor Lempitsky. ICDAR, 2013, pdf
-
Multi-digit number recognition from street view imagery using deep convolutional neural networks, Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., & Shet, V. (2013). arXiv preprint arXiv:1312.6082. [pdf]
-
Text Recognition in Videos using a Recurrent Connectionist Approach, Khaoula Elagouni, Christophe Garcia, Franck Mamalet1 , and Pascale Sebillot, ICANN, 2012. pdf
-
A Novel Word Spotting Method Based on Recurrent Neural Networks, Volkmar Frinken, Andreas Fischer, R. Manmatha, and Horst Bunke, TPAMI, 2012.pdf
-
End-to-End Text Recognition with Convolutional Neural Networks, Tao Wang, David J. Wu, Adam Coates, Andrew Y. Ng, ICPR, 2012. pdf
-
Detecting texts of arbitrary orientations in natural images, Yao, C., Bai, X., Liu, W., Ma, Y., & Tu, Z. (2012, June). CVPR (pp. 1083-1090). IEEE. [pdf][dataset]
- Robust text detection in natural images with edge-enhanced maximally stable extremal regions, Chen, H., Tsai, S. S., Schroth, G., Chen, D. M., Grzeszczuk, R., & Girod, B. (2011, September). ICIP (pp. 2609-2612). IEEE. [pdf] [code]
- Object detection with discriminatively trained part-based models, Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). TPAMI, 32(9), 1627-1645. [pdf] [code]
-
Optical Character Recognition (OCR), Marina Samuel, [blog]
-
The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy, 2015, [blog]
- Deep Neural Networks for Scene Text Reading Deep Neural Networks for Scene Text Reading, Xiang Bai, Huazhong University of Science and Technology
- Scene Text Detection and Recognition [pdf] Dr. Cong Yao, Megvii (Face++) Researcher, [email protected]
- Deep Learning for Text Spotting, Robotics Research Group, Department of Engineering Science, University of Oxford [pdf]