07.09.2020 г.

Space character: there is more to it than meets the eye

Hey there, friends! As you already know, we, the Smart Engines team, specialize in text recognition (not exclusively) in various documents. Today we’d like to touch on one more challenging moment when it comes to text recognition on complex backgrounds, which is space character detection. We’ll be using names on bank cards as main examples in this article, but first we’ll take a look at an example with “the ghost” of the letter “Ё”. As you can see in the image below, there are some distortions to the right of the letter D, but Ё is still fairly distinct. If we review this unit by itself, a person (or a neural network) will definitely see that there is a letter present.

Space character: there is more to it than meets the eye

As you can see from the picture, we work on original images with complex backgrounds. That’s why our space characters won’t be uniform. The backgrounds might consist of patterns, logos, and sometimes even text. For example, during the process of credit card recognition, we meet the words VISA and MAESTRO on the cards. Such “complex” unique spaces, rather than white square space characters, are exactly what sparks our interest [1].

What’s so complex about it?

Space is a character that doesn’t have any distinct characteristics. When working with complex backgrounds, like in the pictures, it can be hard to recognize a separately cut-out space symbol even for a person.

Space character: there is more to it than meets the eye

On the other hand, a space character is inherently different from the others. If the name ASIA gets recognized as ABIA, there is still a chance to fix it during post-processing. But if we get A IA as a result, there is not much we can do.

Recognition methods employed by others

A space characters filtering using the statistics calculated for an image is often used in practice. For example, we can determine an average absolute gradient value for an image or dispersion of pixel intensities and divide an image into spaces and letters using the threshold value. But as we can see from the charts, such methods are not going to work for grey images with complex backgrounds. Due to an apparent correlation of the values, even these methods used together won’t be efficient enough.

Space character: there is more to it than meets the eye

 

Everyone’s favorite binarization is not going to work here either. For example, when we have an image like this:

Space character: there is more to it than meets the eye

All right, what can we do to improve document recognition?

Since a person needs to see what surrounds a space character in order to detect it, it makes sense to show at least two adjacent symbols to a neural network. We don’t want to increase the recognition network input. Overall, the network performs decently (and recognizes a good amount of spaces). That’s why we are going to create a different network — a simpler one. The new network will predict if there are two spaces, or two letters, or a space and a letter, or a letter and a space in the image. Accordingly, this network will be used together with the recognition network. The picture exhibits the used architectures: the recognition network architecture is on the left, the proposed network architecture is on the right.  The recognition network works with an image of a single character, while the new network works on the image of double width with two adjacent characters.

Space character: there is more to it than meets the eye

Let’s put it to the test!

We used 4320 lines with names that consisted of 130149 characters, 68246 of which were spaces for testing purposes. There are two methods we can employ here: the original method where we cut a line into characters and recognize each one separately, and the new method where we cut a line into characters as well, then use the new network to find all the spaces, and employ the regular network to recognize the rest of the characters. We can see from the table that the space recognition quality increases, as well as the overall quality, while the letter recognition quality goes down a bit.

Spaces Letters Total
Basic method 93.6% 99.8% 96.5%
New method 94.3% 99.6% 96.8%

 

However, our original network is able to recognize space characters as well (even though the recognition quality is not as good as we want it to be). We can try and see how successful it would be. Let’s review both methods’ errors. We want to compare the new method quality to the original method errors, and vice versa.

 

The original method:

 

Spaces Spaces Total
Basic method errors 4392 141 4533
New method recognition  44.7% 29.8% 44.3%

The new method:

Spaces Characters Total
Basic method errors 3893 241 4134
New method recognition  37.6% 58.9%

38.9%

The last three tables demonstrate that in order to get improved recognition results, it’s best to use a balanced combination of network estimations. Furthermore, character-by-character recognition quality is quite interesting, but it gets even more interesting with the line-by-line recognition.

Quality
Basic method 96.39%
With a new network 96.46%
Combination of methods 97.07%

 

Conclusion

The space character is a huge challenge that needs to be tackled on the way to the perfect document recognition. Using this example we can see the importance of reviewing not just separate characters, but their combination as well. Let’s not get overzealous here and start training massive networks that process entire lines. Sometimes all we need is just one more small network.

This article used the materials of the report from the European Сonference on Modelling and Simulation 2015 (Varna, Bulgaria): Sheshkus, A. & Arlazarov, V.L. (2015). Space symbol detection on the complex background using visual context.

Improve your business with Smart Engines technologies

Identity document scanning

Recognition of ID cards, passports, driver’s licenses, residence permits, visas, and more. Works on a mobile phone or server, on photos and scans, regardless of their quality, as well as in the video stream from a smartphone or web camera, robust to capturing conditions. No data transfer - scanning is performed on-device and on-premise.

Credit cards, barcodes, MRZ scanning

Recognition of data from codified objects. Captures machine-readable zones (MRZ), embossed, indent-printed, and free-template bank cards, PDF417, QR code, AZTEC and other linear and 2D barcodes using a smartphone’s camera, on the fly. Works in mobile applications (on-device) and scans photographs, regardless of lighting conditions.

 

Document & Form Reading software

Automatic extraction of data from documents (KYC questionnaires, applications, tests, etc), administrative papers (accounting documents, corporate reports, business forms), and government forms (financial statements, insurance policies, etc). Recognizes scans and photographs taken in natural conditions. Total security: only on-premise installation.

Computational Imaging and Tomography

Green AI for Tomographic reconstruction and visualization. Algorithmization of the image reconstruction process directly during the of X-ray tomographic scanning process. We aspire to reduce the radiation dose received during the exposure by finding the optimal termination point of scanning.

Send Request

Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.

    Send Request

    Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.