fbpx

September 7, 2020 | Sci-Dev

Space character: there is more to it than meets the eye

Hey there, friends! As you already know, we, the Smart Engines team, specialize in text recognition (not exclusively) in various documents. Today we’d like to touch on one more challenging moment when it comes to text recognition on complex backgrounds, which is space character detection. We’ll be using names on bank cards as main examples in this article, but first we’ll take a look at an example with “the ghost” of the letter “Ё”. As you can see in the image below, there are some distortions to the right of the letter D, but Ё is still fairly distinct. If we review this unit by itself, a person (or a neural network) will definitely see that there is a letter present.

As you can see from the picture, we work on original images with complex backgrounds. That’s why our space characters won’t be uniform. The backgrounds might consist of patterns, logos, and sometimes even text. For example, during the process of credit card recognition, we meet the words VISA and MAESTRO on the cards. Such “complex” unique spaces, rather than white square space characters, are exactly what sparks our interest [1].

What’s so complex about it?

Space is a character that doesn’t have any distinct characteristics. When working with complex backgrounds, like in the pictures, it can be hard to recognize a separately cut-out space symbol even for a person.

On the other hand, a space character is inherently different from the others. If the name ASIA gets recognized as ABIA, there is still a chance to fix it during post-processing. But if we get A IA as a result, there is not much we can do.

Recognition methods employed by others

A space characters filtering using the statistics calculated for an image is often used in practice. For example, we can determine an average absolute gradient value for an image or dispersion of pixel intensities and divide an image into spaces and letters using the threshold value. But as we can see from the charts, such methods are not going to work for grey images with complex backgrounds. Due to an apparent correlation of the values, even these methods used together won’t be efficient enough.

 

Everyone’s favorite binarization is not going to work here either. For example, when we have an image like this:

All right, what can we do to improve document recognition?

Since a person needs to see what surrounds a space character in order to detect it, it makes sense to show at least two adjacent symbols to a neural network. We don’t want to increase the recognition network input. Overall, the network performs decently (and recognizes a good amount of spaces). That’s why we are going to create a different network – a simpler one. The new network will predict if there are two spaces, or two letters, or a space and a letter, or a letter and a space in the image. Accordingly, this network will be used together with the recognition network. The picture exhibits the used architectures: the recognition network architecture is on the left, the proposed network architecture is on the right.  The recognition network works with an image of a single character, while the new network works on the image of double width with two adjacent characters.

Let’s put it to the test!

We used 4320 lines with names that consisted of 130149 characters, 68246 of which were spaces for testing purposes. There are two methods we can employ here: the original method where we cut a line into characters and recognize each one separately, and the new method where we cut a line into characters as well, then use the new network to find all the spaces, and employ the regular network to recognize the rest of the characters. We can see from the table that the space recognition quality increases, as well as the overall quality, while the letter recognition quality goes down a bit.

Spaces Letters Total
Basic method 93.6% 99.8% 96.5%
New method 94.3% 99.6% 96.8%

 

However, our original network is able to recognize space characters as well (even though the recognition quality is not as good as we want it to be). We can try and see how successful it would be. Let’s review both methods’ errors. We want to compare the new method quality to the original method errors, and vice versa.

 

The original method:

 

Spaces Spaces Total
Basic method errors 4392 141 4533
New method recognition  44.7% 29.8% 44.3%

The new method:

Spaces Characters Total
Basic method errors 3893 241 4134
New method recognition  37.6% 58.9%

38.9%

The last three tables demonstrate that in order to get improved recognition results, it’s best to use a balanced combination of network estimations. Furthermore, character-by-character recognition quality is quite interesting, but it gets even more interesting with the line-by-line recognition.

Quality
Basic method 96.39%
With a new network 96.46%
Combination of methods 97.07%

 

Conclusion

The space character is a huge challenge that needs to be tackled on the way to the perfect document recognition. Using this example we can see the importance of reviewing not just separate characters, but their combination as well. Let’s not get overzealous here and start training massive networks that process entire lines. Sometimes all we need is just one more small network.

This article used the materials of the report from the European Сonference on Modelling and Simulation 2015 (Varna, Bulgaria): Sheshkus, A. & Arlazarov, V.L. (2015). Space symbol detection on the complex background using visual context.

Send Request 5

More Sci-Dev posts

ID optical character recognition on a mobile phone: simple to complex

ID optical character recognition on a mobile phone: simple to complex

The idea of using a mobile device for document scanning and recognition has been worked on since the appearance of the first camera phone. But for quite a long time, the poor quality of a camera on mobile devices and the low performance of mobile processors didn’t allow developing optical character recognition systems precise enough for practical use. Today smartphones and tablets are considered to be one of the best data entry options – both in terms of the camera quality and the processor.

Image binarization algorithm in computed tomography

Image binarization algorithm in computed tomography

When analyzing porous materials, the binarization process becomes essential because in this case a data model doesn’t involve an intermediate state between a hollow pore and an impenetrable matrix. You will learn about an interesting approach to do so using neural networks without ground truth and will get a glimpse into the world of computed tomography and its related fields.

Test Drive Our Smart Engines

Free demo apps allow you to experience the power of Smart Engines software for intelligent document scanning in a real-world context.

Why not experience the power of Smart Engines for yourself? Our demo apps allow you to test the capabilities of our identity document recognition software on mobile devices in videostream or in a single image (photo, scan).

Simply display any document to the camera in real-time or choose a photo from the gallery, and the app will recognize and capture the necessary data.

Demo apps Privacy Policy

id documents enginge by Smart Engines
Apple App Store Badge
Google Play Badge
id documents enginge by Smart Engines

Send Request

Send request for quotation or more information about products.

Contact Form

Smart Engines is to provide a reply within 2 business days. If you don't receive a message from our representative within 2 business days, please check your spam folder or simply send us an email to sales@smartengines.com

Smart Engines is committed to privacy, we are fully compliant with GDPR and CCPA, all the personal data is intended for internal use only.