# 文本识别-MORAN

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition
KeyWords Plus:      Scene text recognition     optical character recognition

## Introduction

MORAN是一种文本识别算法，可以针对不规则文本进行处理
MORAN文本识别算法由矫正子网络MORN和识别子网络ASRN组成，在MORN中设计了一种新颖的像素级弱监督学习机制用于不规则文本的形状纠正，大大降低了不规则文本的识别难度。

The training of the MORN is guided by the ASRN, which requires only text labels. Without any geometric-level or pixel-level supervision, the MORN is trained in a weak supervision way.

1、propose the MORAN framework to recognize irregular scene text.
2、Trained in a weak supervision way, the subnetwork MORN is flexible. It is free of geometric constraints and can rectify images with complicated distortion.
3、propose a fractional pickup method for the training of the attention-based decoder in the ASRN. To address noise perturbations, we expand the visual field of the MORAN, which further improves the sensitivity of the attentionbased decoder.

## Multi-Object Rectification Network

Comparison of the MORN and affine transformation. The MORN is free of geometric constraints. The main direction of rectification predicted by the MORN for each character is indicated by a yellow arrow.
在黄色和蓝色之间的像素补偿是0，颜色的深浅程度代表着补偿的量级，矫正网络如下。

place a pooling layer before the convolutional layer to avoid noise and reduce the amount of calculation.

Similar to the offset maps, the grid contains two channels, which represent the xcoordinate and y-coordinate

The advantages of the MORN are manifold：
1、The rectified images are more readable owing to the regular shape of the text and the reduced noise
2、The MORN is more flexible than the affine transformation. It is free of geometric constraints，which enables it to rectify images using complicated transformations.
3、The MORN is more flexible than methods using a specific number of regressing points
free of geometric constraints，which enables it to rectify images using complicated transformations.
4、The MORN does not require extra labelling information of character positions.

## Attentionbased Sequence Recognition Network

ASRN网络框架如下图所示：

先经过pooling和卷积层之后再接blstm，Each convolutional layer is followed by a batch normalization layer and a ReLU layer.

The largest number of steps that the decoder generates is T. The decoder stops processing when it predicts an end-of-sequence token “EOS” [47]. At time step t, output yt is：

State $s^{_{t}}$ is computed as:

## Fractional Pickup

针对一些由于噪声干扰而产生的误测，如下图所示，该论文还提出了一种措施叫做fractional pickup
An attention-based decoder trained by fractional pickup method can perceive adjacent characters. The wider field of attention contributes to the robustness of the MORAN.

a pair of attention weights are selected and modified at every time step:

1、Variation of Distribution
因为参数是参考临近features，而且具有随机性，增强了参数$\alpha _{}^{t,k}$,$\alpha _{}^{t,k+1}$的鲁棒性，这就造成了即使对于同一张图片，每一个step产生的贡献可能不相同，所以容易避免过拟合和增强编码的鲁棒性。

2、Shortcut of Forward Propagation
for step k + 1 in the bidirectional-LSTM, a shortcut connecting to step k is created by fractional pickup. The shortcut retains some features of the previous step in the training phase, which is the interference to the forget gate in bidirectional-LSTM.

Without fractional pickup, the error term of sequence feature vector $h{}^{k}$ is

结果不仅与当前的feature相关，也与相邻的features相关，back-propagated gradients are able to dynamically optimize the decoder over a broader range of neighbouring regions.

## Limitation of the MORAN

because of complicated background, the MORAN will fail when the curve angle is too large.

## Conclusion

The proposed framework involves two stages: rectification and recognition. First, a multiobject rectification network, which is free of geometric constraints and flexible enough to handle complicated deformations, was proposed to transform an image containing irregular text into a more readable one.The proposed MORAN is trained in a weak-supervised way, which requires only images and the corresponding text labels.