Hey there, friends! As you already know, we, the Smart Engines team, specialize in text recognition (not exclusively) in various documents. Today we’d like to touch on one more challenging moment when it comes to text recognition on complex backgrounds, which is space character detection. We’ll be using names on bank cards as main examples in this article, but first we’ll take a look at an example with “the ghost” of the letter “Ё”. As you can see in the image below, there are some distortions to the right of the letter D, but Ё is still fairly distinct. If we review this unit by itself, a person (or a neural network) will definitely see that there is a letter present.
As you can see from the picture, we work on original images with complex backgrounds. That’s why our space characters won’t be uniform. The backgrounds might consist of patterns, logos, and sometimes even text. For example, during the process of credit card recognition, we meet the words VISA and MAESTRO on the cards. Such “complex” unique spaces, rather than white square space characters, are exactly what sparks our interest [1].
What’s so complex about it?
Space is a character that doesn’t have any distinct characteristics. When working with complex backgrounds, like in the pictures, it can be hard to recognize a separately cut-out space symbol even for a person.
On the other hand, a space character is inherently different from the others. If the name ASIA gets recognized as ABIA, there is still a chance to fix it during post-processing. But if we get A IA as a result, there is not much we can do.
Recognition methods employed by others
A space characters filtering using the statistics calculated for an image is often used in practice. For example, we can determine an average absolute gradient value for an image or dispersion of pixel intensities and divide an image into spaces and letters using the threshold value. But as we can see from the charts, such methods are not going to work for grey images with complex backgrounds. Due to an apparent correlation of the values, even these methods used together won’t be efficient enough.
Everyone’s favorite binarization is not going to work here either. For example, when we have an image like this:
All right, what can we do to improve document recognition?
Since a person needs to see what surrounds a space character in order to detect it, it makes sense to show at least two adjacent symbols to a neural network. We don’t want to increase the recognition network input. Overall, the network performs decently (and recognizes a good amount of spaces). That’s why we are going to create a different network – a simpler one. The new network will predict if there are two spaces, or two letters, or a space and a letter, or a letter and a space in the image. Accordingly, this network will be used together with the recognition network. The picture exhibits the used architectures: the recognition network architecture is on the left, the proposed network architecture is on the right. The recognition network works with an image of a single character, while the new network works on the image of double width with two adjacent characters.
Let’s put it to the test!
We used 4320 lines with names that consisted of 130149 characters, 68246 of which were spaces for testing purposes. There are two methods we can employ here: the original method where we cut a line into characters and recognize each one separately, and the new method where we cut a line into characters as well, then use the new network to find all the spaces, and employ the regular network to recognize the rest of the characters. We can see from the table that the space recognition quality increases, as well as the overall quality, while the letter recognition quality goes down a bit.
Spaces | Letters | Total | |
Basic method | 93.6% | 99.8% | 96.5% |
New method | 94.3% | 99.6% | 96.8% |
However, our original network is able to recognize space characters as well (even though the recognition quality is not as good as we want it to be). We can try and see how successful it would be. Let’s review both methods’ errors. We want to compare the new method quality to the original method errors, and vice versa.
The original method:
Spaces | Spaces | Total | |
Basic method errors | 4392 | 141 | 4533 |
New method recognition | 44.7% | 29.8% | 44.3% |
The new method:
Spaces | Characters | Total | |
Basic method errors | 3893 | 241 | 4134 |
New method recognition | 37.6% | 58.9% |
38.9% |
The last three tables demonstrate that in order to get improved recognition results, it’s best to use a balanced combination of network estimations. Furthermore, character-by-character recognition quality is quite interesting, but it gets even more interesting with the line-by-line recognition.
Quality | |
Basic method | 96.39% |
With a new network | 96.46% |
Combination of methods | 97.07% |
Conclusion
The space character is a huge challenge that needs to be tackled on the way to the perfect document recognition. Using this example we can see the importance of reviewing not just separate characters, but their combination as well. Let’s not get overzealous here and start training massive networks that process entire lines. Sometimes all we need is just one more small network.
This article used the materials of the report from the European Сonference on Modelling and Simulation 2015 (Varna, Bulgaria): Sheshkus, A. & Arlazarov, V.L. (2015). Space symbol detection on the complex background using visual context.