Today we’ll add one more article to our classic collection of articles about the methods of image processing and image recognition and how these methods are applicable in practice. We’ll be discussing the price tag recognition problem (just the regular price tags that we see in the stores all the time). In order to guarantee proper functionality, we’ll add a critical requirement to the problem statement: the recognized images are generated by a compact digital camera, and the computing device has significant limitations in resources. To put it another way, we’ll describe how to recognise price tags with the help of a computationally weak mobile device (and we are not talking so much about cheap Chinese smartphones as specialized industrial data collection terminals which, for a variety of reasons, have pretty weak “brains” as well).
Retail automation allows for reducing the human factor impact, optimization of the staff operations by providing more effective tools for the job, lowering purchasing costs and support costs for various mobile data terminals.
For example, some retail networks implement the process of price tag recognition to make routine tasks of the inventory automatic by using the machines for product data collection. And beside that, the staff of the retail locations can keep an eye on product control via their cell phones. On the other hand, price tag recognition makes it possible to please demanding consumers when it comes to price monitoring. Also price tag recognition can be used to monitor the competition prices and to optimize the process of price formation.
Besides the requirements for the recognition quality, the performance of these systems in a very-low energy mode is relevant as well. It happens due to the fact that energy- and resource-intensive algorithms lead to a quick discharge of the used device (a robot machine or devices operated by the staff), and that would have a negative impact on the work process.
We’ll talk about computationally effective search, localization and price recognition on price tags. As we’ve promised earlier, we’ll review the aspect of the problem that is essential in practice: when the Odroid XU4 data terminal is used for price tag recognition (it’s a single-board computer available for researchers that is slightly inferior to the Honeywell terminal in terms of the computational characteristics) with significant limitations to the quality of imaging. Today we’ll be focusing strictly on the retail-price recognition, and ignoring the rest of the price tag elements.
The specifics of the price tag recognition problem
Let’s review the specifities of our problem:
Let’s talk in more detail about item 4 and discuss the special features of the generated price tag image.
When using the cameras of the mobile platforms for price tag image collection, optical recognition of the documents with the help of compact digital cameras becomes problematic due to:
In addition to the mentioned issues, the picture can be taken improperly, and the price tag image will be incomplete or it won’t be there altogether.
Another challenge is the occurence of projective distortions, but we won’t be paying much attention to them in this article since, when using robotic systems and smartphones, the survey parameters are strictly pre-determined and are adjusted once before starting the work process. The examples of actual images are shown in the pictures below.
The algorithm for price tag recognition
We’ll be identifying target elements (elements that indicate the price) and noise elements (signs, codes, barcodes, artefacts occurring during image processing, etc). Their integration is what makes up the price tag.
The flow diagram of the suggested algorithm is demonstrated in the figure below. It shows all the major stages of the price recognition process in a price tag image that are common for all the automatic input systems. These stages include search and localization of a target area in an image, target zone preparation for the recognition process, target zone recognition, post-processing of the recognition results.
The recognition process is performed by a simple network for segmentation and recognition, without specialized training or configurations for price tag lettering. Post-processing consists of the recognized value normalization to one of the possible price formats in a price tag.
Since the price tags can have various color schemes, first, the generated full-color image is converted into a grayscale image by calculating the pixel average values for the channels. And for the ease of future understanding of the algorithm and the figures related to it, we’ll be working with a converted image, which means the target symbols are supposed to be white now.
Uneven lighting of longer objects in the scene results in inhomogeneous brightness of their prototype in an image. To compensate for this unhomogeneity, we’ve decided to use the Niblack algorithm for local binarization.
The goal of binarization is the correct binarization of the majority of the characters in the scene, and it is not intended to get perfectly accurate binarized characters. This has to do with the fact that more precise boundaries and positioning of characters is supposed to be identified during the future stages of the algorithm.
The weak spot of the Niblack algorithm is the choice of the window size. According to the classic recommendations, it should be about the size of the linear dimensions of the character from the binarized text. Since it is expected that the price tag takes up most of the image, we are able to estimate the size of the characters comprising the price.
Since the price is often located close to the edges of the price tag, the vertical size of the window selected is a little bigger than the approximate height of the character. If we review the surroundings of the price in the horizontal direction, there are no other objects similar in measurements. That’s why the horizontal window size (approximately equal to the width of several symbols) is used.
After the binarization is finished, the morphological opening operation with a square primitive is performed. Its purpose is to mute any possible noise elements in a binarized image and maintain the target elements intact at the same time. With that in mind, the primitive size is selected. By the way, it has already been written about many times, but we’ll mention it one more time: the van Herk/Gil-Werman algorithm turns the morphological filtering into a very effective tool from the computational point of view.
However, if the original image was really blurry and had a lot of noise, this operation might make things even worse (see the figure below) — the necessary elements will get only thinner and they will get discontinuities in them or existing discontinuities will grow longer.
That’s why the algorithm will be branching there, and there will be two images entering the following stages: an image processed by the morphological operation and an original binarized image.
Next step following the branching would be to calculate 8-connected components. And after that they are analyzed in two stages.
The first stage consists of raw primitive filtering based on the approximate sizes of the target symbols, their dimensions and the approximate ratios of their measurements. During this stage, it’s critical to set some mild restrictions in order not to filter target components. As a result, there will be only “potential candidates” for the price elements left after filtering.
The second stage consists of component clustering based on the possible price formats, after which the boundary square is calculated based on the price format as well.
The knowledge of layouts makes it possible to search for the clusters with restrictions for the number of elements and for element joint positioning. In addition, the knowledge of the layout allows for receiving a confirmation of the filtering correctness during the first stage. For example, taking into account our knowledge of some point’s location in a price tag layout, we can find the connected component of the point after finding the connected components of the symbols, and once it’s found, we can use this information for the future stages. Sometimes the last symbol is not detected (the connected component might get split up into a few separate ones), but if we know the layout, and specifically that there are two symbols following the point, we’ll be able to extend the boundary square by the width of one character.
The next stage will be focusing on comparing the junction clusters. During this phase the comparison of cluster variations found in two parallel branches of the image processing stage is performed. This comparison is carried out based on the layout and joint positioning knowledge and the measurements of the layout elements.
We are approaching the recognition stage, and now we are facing the text distortion problem. Since the image can be tilted only to a small angle, it won’t have a huge impact on the binarization, morphology and connected component analysis stages. But at the recognition stage a tilted area or, even worse, a truncated area will cause errors. To determine the degree of the image tilt, the Hough transform is used the way. If the result is higher than the predetermined threshold for the tilt angle, the coordinates of the boundary square are corrected. As a result, we get a boundary quadrangle instead of a boundary square.
Next, the tilt angle is compensated if needed, and it translates a quadrangle into a square.
We are going to need a cut-out image of the price area in color. That’s why the process of cutting out the price area or the process of translating the quadrangle into the square and its consequent cutting-out are performed on a color image.
We selected a price tag dataset of one of the retail companies for testing purposes, and the options of the algorithmic chain were customized to the price tag layout specific to this company.
There were 708 images in the dataset, 29 of them didn’t have correct images of price tags (we’ll be referring to them as correct images). Out of 679 correct images, ~80 were rotated, ~90 were dark, ~150 were low-contrast images, ~ 40 had some overlays, and 50 were blurred. The sizes of the images were in the range from 1350×700 to 800×400.
Testing was performed both on the set of only correct images and on the full dataset.
The quality was assessed according to the following criteria:
These are the results of price tag detection in the correct images:
Here’s the results of price tag detection on the materials of the entire dataset:
And these are the price recognition results for the correctly identified zones:
As we can see, the suggested algorithm provides high accuracy and comprehensiveness. One of the disadvantages of the algorithm is the higher FP number with an increase of incorrect images. At the same time, the TN number is rising as well, and even more rapidly than the FP number.
Let’s not forget about the algorithm running time: the average time for image recognition from the moment it gets entered at the input of the algorithm to getting the price value is 162 milliseconds on a single-threading Odroid-XU4 device.
Despite natural disasters and viruses, the stores will always be getting consumers and money. And this means that there will always be those who want to cash in on retail automation. The retail industry is especially greedy for automation and methods for cutting costs of various sectors of economy. In other words, when it’s fully automated, the retail industry will start earning off the investments today (tomorrow as the latest). A whole lot of programmers, students, start-uppers (whoever is applicable in your case) are eager to satisfy the need of the retail industry for the intelligent automation of the business processes by offering various “algorithm prototypes”. All these developers of the systems that will be “helpful” for the retail industry (and not just retail) can be explained by the obvious availability of the training tools for the neural networks and machine learning. At the same time, people who make decisions about product and technology implementation in the retail industry have some sort of awareness that the prototype that performs well in a laboratory setting might not ever get up and running the way it is supposed to, even at considerable expense. Thank you for your time!
Recognition of ID cards, passports, driver’s licenses, residence permits, visas, and more. Works on a mobile phone or server, on photos and scans, regardless of their quality, as well as in the video stream from a smartphone or web camera, robust to capturing conditions. No data transfer - scanning is performed on-device and on-premise.
Recognition of data from codified objects. Captures machine-readable zones (MRZ), embossed, indent-printed, and free-template bank cards, PDF417, QR code, AZTEC and other linear and 2D barcodes using a smartphone’s camera, on the fly. Works in mobile applications (on-device) and scans photographs, regardless of lighting conditions.
Automatic extraction of data from documents (KYC questionnaires, applications, tests, etc), administrative papers (accounting documents, corporate reports, business forms), and government forms (financial statements, insurance policies, etc). Recognizes scans and photographs taken in natural conditions. Total security: only on-premise installation.
Green AI for Tomographic reconstruction and visualization. Algorithmization of the image reconstruction process directly during the of X-ray tomographic scanning process. We aspire to reduce the radiation dose received during the exposure by finding the optimal termination point of scanning.
Please fill out the form to get more information about the products,
pricing and trial SDK for Android, iOS, Linux, Windows.