文本检测-TextSnake


TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
KeyWords Plus:      ECCV 2018     Curved Text

Introduction

1、论文创新点

        1、Propose a flexible and general representation for scene text of arbitrary shapes
        2、Predict the Text Center Line (TCL)      Radius r      Orientation θ      Text regions (TR)

Alt text

        如上图所示,该论文的创新点主要在于提出一个类似于文本蛇的检测方式对不规则文本进行预测,事实上自然场景中多数如下图所示的文本,所以按照正常矩形框的检测方式显然无法有效的解决这种情况。因为自然场景中的文本可以各种形状,但本质不变的就是他必然是一个不断层的文本(所以我相信作者可能也是基于这个想用很多圆去拟合文本。)
        该算法主要做的是五个任务:1、预测文本 2、预测文本中心线 3、预测一个文本中15个圆的半径 4、预测中心线与圆心的sin 5、预测cos
        

Alt text

        对于这种弯曲并且非平行视角的文本而言,传统的矩形框检测显然不够用了,而且不规则文本也是之后的发展方向,有心的读者如何观察18年下半年的论文趋势,可以发现基本找不到以往的矩形框文本检测方式的,大多顶会论文都是针对不规则文本或者是通用文本所提出的解决方法。

2、算法主体

Alt text

        In order to detect text with arbitrary shapes, we employ an FCN model to predict the geometry attributes of text instances.The FCN based network predicts score maps of text center line (TCL) and text regions (TR), together with geometry attributes, including r, cosθ and sinθ.

        网络框架

Alt text

        网络框架如上图所示,采用VGG16,抽取五层的feature map进行融合预测。
        下面就是融合上采样的方法,采样五层特征,深层先上采样然后与浅层进行融合,再用一个(11)卷积和(33)的卷积核进行卷积运算。

        上采样

Alt text

        产生label

        Extracting Text Center Line
        For triangles and quadrangles, it’s easy to directly calculate the TCL with algebraic methods, since in this case, TCL is a straight line.

        It has two edges that are respectively the head and the tail. The two edges near the head or tail are running parallel but in opposite direction.

Alt text

        主要思路是先找到文本的两个端点和边线,然后按照边线的1/2去寻找中心线,除去两边的端点就是中心线的label。

3、Loss

        Loss 主要分为分类loss和回归loss,分类TR和TCL为分类loss,半径角度这些为回归loss,TR和TCL使用的是交叉熵,并加入了Oinline hard negative mining 去解决正负样本不平衡问题。

Alt text

        Regression loss使用Smoothed loss计算,并且这些只对tcl内的计算,对tcl外的像素没有任何意义。

Alt text

4、Datasets

        SynthText
        Contains about 800K synthetic images.

        TotalText
        Newly-released benchmark for text detection. Besides horizontal and multi-Oriented text instances.The dataset is split into training and testing sets with 1255 and 300 images, respectively.

        CTW1500
        another dataset mainly consisting of curved text. It consists of 1000 training images and 500 test images. Text instances are annotated with polygons with 14 vertexes.

        ICDAR 2015

        MSRA-TD500

        A dataset with multi-lingual, arbitrary-oriented and long text lines. It includes 300 training images and 200 test images with text line level annotations

5、Experiment Results

Alt text
Total-Text

Alt text
CTW1500

Alt text
ICDAR 2015

Alt text

6、Conclusion and Future work

        这篇paper在不规则场景文本检测里面也算是先锋者了,不规则场景文本检测的paper大多数都是18年后半年迸发的,但是人个感觉得这个paper的方法不是很好,比较繁琐,有很多可以改进的地方。

反馈与建议

文章目录
  1. 1. Introduction
    1. 1.1. 1、论文创新点
    2. 1.2. 2、算法主体
    3. 1.3. 3、Loss
    4. 1.4. 4、Datasets
    5. 1.5. 5、Experiment Results
    6. 1.6. 6、Conclusion and Future work
  2. 2. 反馈与建议
|