文本识别-MORAN


MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition
KeyWords Plus:      Scene text recognition     optical character recognition

Introduction

         MORAN是一种文本识别算法,可以针对不规则文本进行处理
          MORAN文本识别算法由矫正子网络MORN和识别子网络ASRN组成,在MORN中设计了一种新颖的像素级弱监督学习机制用于不规则文本的形状纠正,大大降低了不规则文本的识别难度。

Alt text

          The training of the MORN is guided by the ASRN, which requires only text labels. Without any geometric-level or pixel-level supervision, the MORN is trained in a weak supervision way.

几个创新点和论文贡献:

         1、propose the MORAN framework to recognize irregular scene text.
         2、Trained in a weak supervision way, the subnetwork MORN is flexible. It is free of geometric constraints and can rectify images with complicated distortion.
         3、propose a fractional pickup method for the training of the attention-based decoder in the ASRN. To address noise perturbations, we expand the visual field of the MORAN, which further improves the sensitivity of the attentionbased decoder.

Multi-Object Rectification Network

Alt text

         Comparison of the MORN and affine transformation. The MORN is free of geometric constraints. The main direction of rectification predicted by the MORN for each character is indicated by a yellow arrow.
         在黄色和蓝色之间的像素补偿是0,颜色的深浅程度代表着补偿的量级,矫正网络如下。

Alt text

         place a pooling layer before the convolutional layer to avoid noise and reduce the amount of calculation.

Alt text

         Similar to the offset maps, the grid contains two channels, which represent the xcoordinate and y-coordinate
如下图所示,通过补偿网络会产生一个offset maps,他有两个通道分别代表着x和y方向上的补偿信息,同时也会产生一个basic grid用于记录original positions of the pixels。最终的补偿网络计算如下:

Alt text

Alt text
整体的MORAN如上图所示,左边为矫正网络,右边为识别网络

         The advantages of the MORN are manifold:
          1、The rectified images are more readable owing to the regular shape of the text and the reduced noise
          2、The MORN is more flexible than the affine transformation. It is free of geometric constraints,which enables it to rectify images using complicated transformations.
          3、The MORN is more flexible than methods using a specific number of regressing points
free of geometric constraints,which enables it to rectify images using complicated transformations.
         4、The MORN does not require extra labelling information of character positions.

Attentionbased Sequence Recognition Network

         ASRN网络框架如下图所示:

Alt text

         先经过pooling和卷积层之后再接blstm,Each convolutional layer is followed by a batch normalization layer and a ReLU layer.

         The largest number of steps that the decoder generates is T. The decoder stops processing when it predicts an end-of-sequence token “EOS” [47]. At time step t, output yt is:

Alt text

         State $s^{_{t}}$ is computed as:

Alt text

Fractional Pickup

         针对一些由于噪声干扰而产生的误测,如下图所示,该论文还提出了一种措施叫做fractional pickup
         An attention-based decoder trained by fractional pickup method can perceive adjacent characters. The wider field of attention contributes to the robustness of the MORAN.

Alt text

         a pair of attention weights are selected and modified at every time step:

Alt text

主要有以下几个优点:

         1、Variation of Distribution
         因为参数是参考临近features,而且具有随机性,增强了参数$\alpha _{}^{t,k}$,$\alpha _{}^{t,k+1}$的鲁棒性,这就造成了即使对于同一张图片,每一个step产生的贡献可能不相同,所以容易避免过拟合和增强编码的鲁棒性。

         2、Shortcut of Forward Propagation
         for step k + 1 in the bidirectional-LSTM, a shortcut connecting to step k is created by fractional pickup. The shortcut retains some features of the previous step in the training phase, which is the interference to the forget gate in bidirectional-LSTM.

         3、Broader Visual Field
         Without fractional pickup, the error term of sequence feature vector $h{}^{k}$ is

Alt text

结果只和一个固定的参数相关,但是加入了fractional pickup以后,等式就变成了:

Alt text

         结果不仅与当前的feature相关,也与相邻的features相关,back-propagated gradients are able to dynamically optimize the decoder over a broader range of neighbouring regions.

Performance of the MORAN

Alt text

Alt text

Limitation of the MORAN

         because of complicated background, the MORAN will fail when the curve angle is too large.

Alt text

Conclusion

         The proposed framework involves two stages: rectification and recognition. First, a multiobject rectification network, which is free of geometric constraints and flexible enough to handle complicated deformations, was proposed to transform an image containing irregular text into a more readable one.The proposed MORAN is trained in a weak-supervised way, which requires only images and the corresponding text labels.

反馈与建议

文章目录
  1. 1. Introduction
  2. 2. Multi-Object Rectification Network
  3. 3. Attentionbased Sequence Recognition Network
  4. 4. Fractional Pickup
  5. 5. Performance of the MORAN
  6. 6. Limitation of the MORAN
  7. 7. Conclusion
  8. 8. 反馈与建议
|