文本检测-PSENet-1s


Shape Robust Text Detection with Progressive Scale Expansion Network
KeyWords Plus:      CVPR2019     Curved Text     Face++

Introduction

     PSENet 分好几个版本,最新的一个是19年的CVPR,这是一篇南京大学和face++合作的文章(好像还有好几个机构的人),19年出现了很多不规则文本检测算法,TextMountain、Textfield等等,不过为啥我要好好研究这个(因为这篇文章开源了代码。。。)

1、论文创新点

        1、Propose a novel kernel-based framework, namely, Progressive Scale Expansion Network (PSENet)
        2、Adopt a progressive scale expansion algorithm based on Breadth-First-Search (BFS)
            (1)、Starting from the kernels with minimal scales (instances can be distinguished in this step)     
            (2)、Expanding their areas by involving more pixels in larger kernels gradually    
            (3)、Finish- ing until the complete text instances (the largest kernels) are explored.

        这个文章主要做的创新点大概就是预测多个分割结果,分别是S1,S2,S3…Sn代表不同的等级面积的结果,S1最小,基本就是文本骨架,Sn最大。然后在后处理的过程中,先用最小的预测结果去区分文本,再逐步扩张成正常文本大小。。。

        

Alt text

2、算法主体

Alt text

        We firstly get four 256 channels feature maps (i.e. P2, P3, P4, P5) from the backbone. To further combine the semantic features from low to high levels, we fuse the four feature maps to get feature map F with 1024 channels via the function C(·) as:

Alt text

        先backbone下采样得到四层的feature maps,再通过fpn对四层feature分别进行上采样2,4,8倍进行融合得到输出结果。

        如上图所示,网络有三个分割结果,分别是S1,S2,S3.首先利用最小的kernel生成的S1来区分四个文本实例,然后再逐步扩张成S2和S3

3、label generation

        产生不同尺寸的S1….Sn需要不同尺寸的labels

Alt text

        不同尺寸的labels生成如上图所示,缩放比例可以用下面公式计算得出:

Alt text

        这个$d_{i}$表示的是缩小后mask边缘与正常mask边缘的距离,缩放比例rate $r_{i}$可以由下面计算得出:

Alt text

        m是最小mask的比例,n在m到1之间的值,成线性增加。

4、Loss Function

        Loss 主要分为分类的text instance loss和shrunk losses,L是平衡这两个loss的参数。分类loss主要用了交叉熵和dice loss。

Alt text

        The dice coefficient D(Si, Gi) 被计算如下:

Alt text

        $L_{s}$ 被计算如下:

Alt text

4、Datasets

        SynthText
        Contains about 800K synthetic images.

        TotalText
        Newly-released benchmark for text detection. Besides horizontal and multi-Oriented text instances.The dataset is split into training and testing sets with 1255 and 300 images, respectively.

        CTW1500
        CTW1500 dataset mainly consisting of curved text. It consists of 1000 training images and 500 test images. Text instances are annotated with polygons with 14 vertexes.

        ICDAR 2015
        Icdar2015 is a commonly used dataset for text detection. It contains a total of 1500 pictures, 1000 of which are used for training and the remaining are for testing. The

        ICDAR 2017 MLT

        ICDAR 2017 MIL is a large scale multi-lingual text dataset, which includes 7200 training im- ages, 1800 validation images and 9000 testing images.

5、Experiment Results

Implementation Details
        All the networks are optimized by using stochastic gradient descent (SGD).The data augmentation for training data is listed as follows: 1) the images are rescaled with ratio {0.5, 1.0, 2.0, 3.0} randomly; 2) the images are horizon- tally flipped and rotated in the range [−10◦, 10◦] randomly; 3) 640 × 640 random samples are cropped from the trans- formed images.

Alt text
Total-Text

Alt text
CTW1500

Alt text
ICDAR 2015

Alt text
IC17-MLT

Alt text

6、Conclusion and Future work

        个人观点:这个文章其实做的只是一件事情,就是用预测得到的小的mask区分文本,然后逐渐扩张形成正常大小的文本mask,个人最近发了一篇比较水的会议论文也是检测不规则文本的:TextCohesion: Detecting Text for ArbitraryShapes,其实本质是和这个文章是差不多的(我发之前还没看过这个文章,好像也没有被收录),不过算法主体是不一样的,我这个文章过几天也会挂到arxiv上,主要也是用小的mask区分文本实例,但是我不是进行扩展,我是讲除了文本骨架外的文本像素给不同的方向预测,使得四周的文本像素对文本骨架有一个类似于“聚心力”的作用,最终形成一个文本实例。pipeline如下(在total和ctw1500上实验指标蛮高,暂时第一):

Alt text

反馈与建议

文章目录
  1. 1. Introduction
    1. 1.1. 1、论文创新点
    2. 1.2. 2、算法主体
    3. 1.3. 3、label generation
    4. 1.4. 4、Loss Function
    5. 1.5. 4、Datasets
    6. 1.6. 5、Experiment Results
    7. 1.7. 6、Conclusion and Future work
  2. 2. 反馈与建议
|