Hey there, friends! As you already know, we, the Smart Engines team, specialize in text recognition (not exclusively) in various documents. Today we’d like to touch on one more challenging moment when it comes to text recognition on complex backgrounds, which is space character detection. We’ll be using names on bank cards as main examples in this article, but first we’ll take a look at an example with “the ghost” of the letter “Ё”. As you can see in the image below, there are some distortions to the right of the letter D, but Ё is still fairly distinct. If we review this unit by itself, a person (or a neural network) will definitely see that there is a letter present.
As you can see from the picture, we work on original images with complex backgrounds. That’s why our space characters won’t be uniform. The backgrounds might consist of patterns, logos, and sometimes even text. For example, during the process of credit card recognition, we meet the words VISA and MAESTRO on the cards. Such “complex” unique spaces, rather than white square space characters, are exactly what sparks our interest .
Space is a character that doesn’t have any distinct characteristics. When working with complex backgrounds, like in the pictures, it can be hard to recognize a separately cut-out space symbol even for a person.
On the other hand, a space character is inherently different from the others. If the name ASIA gets recognized as ABIA, there is still a chance to fix it during post-processing. But if we get A IA as a result, there is not much we can do.
A space characters filtering using the statistics calculated for an image is often used in practice. For example, we can determine an average absolute gradient value for an image or dispersion of pixel intensities and divide an image into spaces and letters using the threshold value. But as we can see from the charts, such methods are not going to work for grey images with complex backgrounds. Due to an apparent correlation of the values, even these methods used together won’t be efficient enough.
Everyone’s favorite binarization is not going to work here either. For example, when we have an image like this:
Since a person needs to see what surrounds a space character in order to detect it, it makes sense to show at least two adjacent symbols to a neural network. We don’t want to increase the recognition network input. Overall, the network performs decently (and recognizes a good amount of spaces). That’s why we are going to create a different network — a simpler one. The new network will predict if there are two spaces, or two letters, or a space and a letter, or a letter and a space in the image. Accordingly, this network will be used together with the recognition network. The picture exhibits the used architectures: the recognition network architecture is on the left, the proposed network architecture is on the right. The recognition network works with an image of a single character, while the new network works on the image of double width with two adjacent characters.
We used 4320 lines with names that consisted of 130149 characters, 68246 of which were spaces for testing purposes. There are two methods we can employ here: the original method where we cut a line into characters and recognize each one separately, and the new method where we cut a line into characters as well, then use the new network to find all the spaces, and employ the regular network to recognize the rest of the characters. We can see from the table that the space recognition quality increases, as well as the overall quality, while the letter recognition quality goes down a bit.
However, our original network is able to recognize space characters as well (even though the recognition quality is not as good as we want it to be). We can try and see how successful it would be. Let’s review both methods’ errors. We want to compare the new method quality to the original method errors, and vice versa.
The original method:
|Basic method errors||4392||141||4533|
|New method recognition||44.7%||29.8%||44.3%|
The new method:
|Basic method errors||3893||241||4134|
|New method recognition||37.6%||58.9%||
The last three tables demonstrate that in order to get improved recognition results, it’s best to use a balanced combination of network estimations. Furthermore, character-by-character recognition quality is quite interesting, but it gets even more interesting with the line-by-line recognition.
|With a new network||96.46%|
|Combination of methods||97.07%|
The space character is a huge challenge that needs to be tackled on the way to the perfect document recognition. Using this example we can see the importance of reviewing not just separate characters, but their combination as well. Let’s not get overzealous here and start training massive networks that process entire lines. Sometimes all we need is just one more small network.
This article used the materials of the report from the European Сonference on Modelling and Simulation 2015 (Varna, Bulgaria): Sheshkus, A. & Arlazarov, V.L. (2015). Space symbol detection on the complex background using visual context.
Green AI-powered scanner SDK of ID cards, passports, driver’s licenses, residence permits, visas, and other ids, more than 1810+ types in total. Provides eco-friendly, fast and precise scanning SDK for a smartphone, web, desktop or server, works fully autonomously. Extracts data from photos and scans, as well as in the video stream from a smartphone or web camera, is robust to capturing conditions. No data transfer — ID scanning is performed on-device and on-premise.
Automatic scanning of machine-readable zones (MRZ); all types of credit cards: embossed, indent-printed, and flat-printed; barcodes: PDF417, QR code, AZTEC, DataMatrix, and others on the fly by a smartphone’s camera. Provides high-quality MRZ, barcode, and credit card scanning in mobile applications on-device regardless of lighting conditions. Supports card scanning of 21 payment systems.
Automatic data extraction from business and legal documents: KYC/AML questionnaires, applications, tests, etc, administrative papers (accounting documents, corporate reports, business forms, and government forms - financial statements, insurance policies, etc). High-quality Green AI-powered OCR on scans and photographs taken in real conditions. Total security: only on-premise installation. Automatically scans document data in 2 seconds on a modern smartphone.
Green AI for Tomographic reconstruction and visualization. Algorithmization of the image reconstruction process directly during the X-ray tomographic scanning process. We aspire to reduce the radiation dose received during the exposure by finding the optimal termination point of scanning.
Please fill out the form to get more information about the products,
pricing and trial SDK for Android, iOS, Linux, Windows.