Scene Text detection


The introduction of scene text detection
KeyWords Plus:      Pytorch      PxielLink      CVPR      ICDAR

Relevant blog:
        ICPR-2018-OCR笔记
        天池大赛–ICPR Text Detection总结

        自然场景文本检测识别技术综述
        文本检测之RRPN
        文本检测之TextBoxes
        文本检测之MaskTextSpotter
        文本检测之CTPN

Introduction

Resource

Alt text

Scene Text Detection and Recognition: The Deep Learning Era
       arXiv: https://arxiv.org/abs/1811.04256 (draft version)
       Github: https://github.com/Jyouhou/SceneTextPapers (compiled papers, datasets & codes)

Alt text

Scene Text Detection and Recognition: Recent Advances and Future Trends, Frontiers of Computer Science, 2015 (mainly conventional methods)
       Link: http://www.vlrlab.net/admin/uploads/avatars/FCS_TextSurvey_2015.pdf

Github:
       Laboratories and Papers:https://github.com/chongyangtao/Awesome-Scene-Text-Recognition
       SceneTextPapers:https://github.com/Jyouhou/SceneTextPapers

ICDAR

2018-2019 International Conferences in Artificial Intelligence, Computer Vision, Data Mining and Natural Language Processing

文档分析与识别国际会议 International Conference on Document Analysis and Recognition,**ICDAR**)是由国际模式识别学会(IAPR)组织的专业会议之一。

International Conference on Document Analysis and Recognition
Sydney, Australia
September 22 – 25, 2019

Main Conference

  • Submission Deadline: Feb. 15, 2019
  • Acceptance Notification: May 15, 2019
  • Camera ready due: June 15, 2019
  • Main Conference: Sept. 22 - 25, 2019

       准确率与召回率(Precision & Recall)、F-score

       两个最常见的衡量指标是“准确率(precision)”(你给出的结果有多少是正确的)和“召回率(recall)”(正确的结果有多少被你给出了)
       这两个通常是此消彼长的(trade off),很难兼得。很多时候用参数来控制,通过修改参数则能得出一个准确率和召回率的曲线(ROC),这条曲线与x和y轴围成的面积就是AUC(ROC Area)。AUC可以综合衡量一个预测模型的好坏,这一个指标综合了precision和recall两个指标。
       但AUC计算很麻烦,有人用简单的F-score来代替。F-score计算方法很简单:

$$F-score = \frac{2precisionrecall}{precision+recall}$$

1
python script.py -g='gt.zip' -s='/home/weijia.wu/workspace/Paper/ICDAR_test/result.zip' -o='/home/weijia.wu/workspace/Paper/ICDAR_test/' -p='0.8'

文本检测和识别

文本检测和识别进展被分成 5 个类别:

  • 1)从语义分割和目标检测方法中汲取灵感,
  • 2)更简化的 Pipeline
  • 3)处理任意形态文字
  • 4)使用 Attention
  • 5)使用合成数据

          第一个分类:从语义分割和目标检测方法中汲取灵感
           自然场景文字检测与识别技术从语义分割和目标检测方法中汲取灵感而产生的代表性工作主要有:

  • 1)Holistic Multi-Channel Prediction,
  • 2)TextBoxes,
  • 3)Rotation Proposals,
  • 4)Corner Localization and Region Segmentation。
    Alt text

          Holistic Multi-Channel Prediction把文字检测问题转变为一个语义分割问题,输出全局的三种像素级预测,包括图像区域、字符位置和相邻字符间的连接方向。通过这三种信息,输出最右边的结果图,如红色矩形部分所示。这一方法的好处是可以同时处理水平、多方向、弯曲文字。

          另一种方法是 TextBoxes,它受到单步的通用物体检测器 SSD 启发,其本质是把文字作为一种特殊的目标,通过 SSD 对其进行建模。这里的基础模型是 VGG-16,卷积层输出目标文字是否存在以及长宽、大小、方向等信息,同时取得了很高的精度及效率。

          旷视科技 CVPR 2018 收录论文《Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation》提出了一种复合的文字检测方法——Corner Localization and Region Segmentation(角点检测和区域分割),它最大亮点是综合利用目标分割与语义分割两种方法。

          Corner Localization and Region Segmentation基础模型是 VGG-16,其上添加大量的卷积层,以提取特征,再往上是两个分支,1)角点检测分支通过 SSD 定位角点,通过网络提取角点,最终得到角点位置;2)文字区域分割分支则利用基于 R-FCN 的位置敏感分割,生成不同相对位置的分割图,得到更准确的文字检测结果

Alt text

          第二个分类简化的 Pipeline

          旷视科技在 CVPR 2017 收录论文《EAST:An Efficient and Accurate Scene Text Detector》提出一种高度简化的 Pipeline 结构。如上图所示,最左侧是输入图像,最右侧是算法输出结果,中间则是处理步骤, EAST (最下面)把 Pipeline 精简为中间两步,其中一步是通过多通道 FCN 进行几何信息预测以及前景、背景预测;另外一步是 NMS,最终得到多方向文字检测结果

          这种方法的好处主要体现在两个方面
          1)精度方面,允许端到端的训练和优化,
          2)效率方面,剔除了中间冗余的处理步骤。

Alt text

          第三个分类处理任意形态文字
要处理现实世界的文字还面临着一个挑战:文字形态的多变性
          文字检测与识别算法要如何应对呢?给出了两个代表性方案:
          1)TextSnake
          2)Mask TextSpotter

Alt text

          旷视科技 ECCV 2018 收录论文《TextSnake: A Flexible Representation for Detecting Textf Abies》提出一种全新而灵活的表征,称之为 TextSnake

          为了更精确地处理这种情况,图 d 使用了 TextSnake 方法,用一系列圆盘覆盖文字区域,更好地适应文字的变化,包括尺度、方向、形变等等。

Alt text

          如上图所示,旷视科技 ECCV 2018 收录论文《Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes》完成了另外一项工作,在 Mask R-CNN 启发之下提出一种新模型 Mask TextSpotter,通过端到端的方式同时实现了文字检测和识别。Mask TextSpotter 整体框架基于 Mask R-CNN 并进行改造,同样也把文字当作一种特殊目标处理。

          第四个分类借鉴 Attention

          由于 NLP 领域兴起的 Attention 模型的重大影响,其也进入了文字检测与识别的视野,激发出一些新想法,代表性成果有:1)CRNN,2)ASTER,3)FAN。

          旷视科技 TPAMI 2017 的一个工作,称之为 CRNN,其底层用 CNN 提取特征,中层用 RNN 进行序列建模,上层用 CTC loss 对目标进行优化。它是一个端到端可训练的文字识别结构,但并未使用 Attention。目前,CRNN 已成长为该领域的一个标准方法,在 GitHub 上已开源。

Alt text

          旷视科技在 TPAMI 2018 提出一个称之为 ASTER 的解决方案。由于文字存在倾斜、弯曲等问题,在识别阶段,检测也不一定是最理想的,这时需要分两步做识别第一步是给定一张输入图像,把其中的文字矫正到一个有利于识别的状态;第二步则是进行识别。

          ASTER 主要有矫正和识别两个模块。矫正模块在 STN 的基础上做了优化,使得控制点的预测更精确;识别模块则是一个经典的 CNN+RNN 同时带有 Attention 的结构,可以对序列进行预测。

Alt text

Alt text

          第五个分类使用合成数据

          深度学习时代,对数据的需求量大增,大量数据有利于训练出优秀模型。因此,深度学习时代的文字检测和识别方法几乎都会采用合成数据,代表性数据集有 SynthText

Alt text

          Future Trends and Potential Directions

          根据自然场景文字检测与识别技术发展的现状,通过分析其未来趋势及潜在的研究方向,并结合深度学习时代的语境,旷视科技把这一技术的未来挑战归结为 4 个方面:1)多语言文字检测与识别,2)读取任意形态的文字,3)文字图像合成,4)模型鲁棒性。

Alt text
多语言文字检测与识别

Alt text
读取任意形态的文字

Alt text
文字图像合成

Alt text
模型鲁棒性

检测算法

1、TextBoxes++: A Single-Shot Oriented Scene Text Detector

期刊:IEEE TRANSACTIONS ON IMAGE PROCESSING
KeyWords Plus:NATURAL IMAGES; NEURAL-NETWORK; LINE DETECTION; RECOGNITION; CLASSIFICATION; DICTIONARIES; WILD
总结:SSD
SSD can only generate bounding boxes in terms of horizontal rectangles, while TextBoxes++ can generate arbitrarily oriented bounding boxes in terms of oriented rectangles or general quadrilaterals to deal with oriented text.

2、EAST An Efficient and Accurate Scene Text Detector

期刊: CVPR 2017         blog         resource
总结
A scene text detection method that consists of two stages: a Fully Convolutional Network and an NMS merging stage. The FCN directly produces text regions, excluding redundant and time-consuming intermediate steps.
The key component of the proposed algorithm is a neural network model.

论文中:

Alt text

实测:
                                         Recall: 0.772267         Precision: 0.846437         F-score: 0.807653

3、Geometry-Aware Scene Text Detection with Instance Transformation Network

期刊: CVPR 2018
KeyWords Plus: ITN , robustness of framework ,geometry-aware representation

Alt text

总结:In this paper, there have presented a novel end-to-end ITN to effectively detect scene text in the forms of complicated geometric layout. An adaptive geometry-aware representation learning scheme incorporated in the ITN has been proposed to encode the unique geometric configurations of scene text instances. The experimental results on standard benchmarks demonstrate that ITN is able to effectively detect multi-scale, multi-oriented and multi-lingual words or text lines at one pass。

4、Densely Connected Convolutional Networks

github链接https://github.com/liuzhuang13/DenseNet
期刊
KeyWords Plus: CVPR2017的oral
alleviate the vanishing-gradient         strengthen feature propagation
encourage feature reuse        substantially reduce the number of parameters
总结

       文章提出的DenseNet(Dense Convolutional Network)主要还是和ResNet及Inception网络做对比,从feature入手,通过对feature的极致利用达到更好的效果和更少的参数

先列下DenseNet的几个优点,感受下它的强大:
       1、减轻了vanishing-gradient(梯度消失)
       2、加强了feature的传递
       3、更有效地利用了feature
       4、一定程度上较少了参数数量

5、Detecting Text in Natural Image with Connectionist Text Proposal Network

github链接https://github.com/liuzhuang13/DenseNet
KeyWords Plus: CVPR2017的oral
alleviate the vanishing-gradient         strengthen feature propagation
encourage feature reuse        substantially reduce the number of parameters

github链接Github
KeyWords Plus: AAAI-2018
instance segmentation         pixel         link        text/non-text prediction
总结

  • A novel scene text detection algorithm based on instance segmentation(Outside regression)
  • A CNN model is trained to perform two kinds of pixel-wise predictions: text/non-text prediction and link prediction.

       论文:

Alt text
       ICDAR2015

       实测:

                                           Recall: 0.81608088, Precision: 0.8543346, F-score: 0.8347697

7、An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

github链接Github
KeyWords Plus: PAMI2017         CRNN
总结

  • 首先CNN提取图像卷积特征
  • 然后LSTM进一步提取图像卷积特征中的序列特征
  • 最后引入CTC解决训练时字符无法对齐的问题

网络框架

Alt text
MI, 2017

8、TextMountain: Accurate Scene Text Detection via Instance Segmentation

github链接Github
KeyWords Plus:         University of Science and Technology of China        instance segmentation        
总结

  • A novel scene text detection algorithm based on instance segmentation(Outside regression)
  • A CNN model is trained to perform two kinds of pixel-wise predictions: text/non-text prediction and link prediction.

网络框架

Alt text

9、Synthetic Data for Text Localisation in Natural Images

github链接Github
KeyWords Plus: CVPR2016         CRNN
总结
              Two key contributions

  • synthetic dataset of text in cluttered conditions
  • propose a text detection deep architecture called a fully-convolutional regression network(FCRN)

网络框架

Alt text

10、TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

github链接Github
KeyWords Plus:         ECCV        instance segmentation         FCN         irregular shapes
总结

  • Propose a flexible and general representation for scene text of arbitrary shapes
  • The proposed text detection algorithm achieves state-of-the-art performance on several benchmarks, including text instances of different forms(horizontal, oriented and curved).

Methodology:

Alt text

        Text region (in yellow) is represented as a series of ordered disks (in blue), each of which is located at the center line (in green, a.k.a symmetric axis or skeleton) and associated with a radius r and an orientation θ. In contrast to conventional representations (e.g., axis-aligned rectangles, rotated rectangles and quadrangles), TextSnake is more flexible and general, since it can precisely describe text of different forms, regardless of shapes and lengths.
网络框架

Alt text
        Employ an FCN model to predict the geometry attributes of text instances. The FCN based network predicts score maps of text center line (TCL) and text regions (TR), together with geometry attributes, including r, cosθ and sinθ.

Alt text

       choose VGG-16 as our stem network for the sake of direct and fair comparison with other methods.

反馈与建议

文章目录
  1. 1. Introduction
    1. 1.1. Resource
    2. 1.2. ICDAR
    3. 1.3. 文本检测和识别
  2. 2. 检测算法
    1. 2.1. 1、TextBoxes++: A Single-Shot Oriented Scene Text Detector
    2. 2.2. 2、EAST An Efficient and Accurate Scene Text Detector
    3. 2.3. 3、Geometry-Aware Scene Text Detection with Instance Transformation Network
    4. 2.4. 4、Densely Connected Convolutional Networks
    5. 2.5. 5、Detecting Text in Natural Image with Connectionist Text Proposal Network
    6. 2.6. 6、PixelLink: Detecting Scene Text via Instance Segmentation
    7. 2.7. 7、An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
    8. 2.8. 8、TextMountain: Accurate Scene Text Detection via Instance Segmentation
    9. 2.9. 9、Synthetic Data for Text Localisation in Natural Images
    10. 2.10. 10、TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
  3. 3. 反馈与建议
|