The recognition of a scanned document can be considered a classical and well-thought-out problem with a variety of known solutions. Recognition of a machine-readable zone (MRZ) on mobile devices, on the other hand, is a much more complicated task due to the properties of mobile device cameras and the specification of MRZ documents themselves. In this document, we would like to share some peculiar properties of an MRZ with regards to automatic recognition systems operating on a mobile device, including specific image distortions and problems of character recognition in poor capturing conditions, as well as the issues of post-processing of the recognition results using the MRZ language model.
Fig. 1. Russian biometric passport with MRZ. Image source: http://en.wikipedia.org/wiki/Russian_passport
What is MRZ?
Machine-readable zone (MRZ) is a part of machine-readable documents designed in compliance with international recommendations specified in Doc 9303 of the International Civil Aviation Organization (ICAO) (http://www.icao.int/publications/pages/publication.aspx?docnum=9303). An example of MRZ is the two lines that you could find at the bottom of a passport (see Fig. 1).
The document is designed in order to achieve the maximum level of automation in personal ID processing via machine recognition of ID data. Recognition of MRZ documents find broad application in government, travel, financial and commercial areas, for speeding up the state border crossing procedures, internationally unified ID processing, etc. Today’s need for more mobile OCR solutions make MRZ an important target for systems of document recognition on mobile devices.
MRZ recognition in scanned images
Let’s consider the properties of scanning hardware in relation to optical document recognition. In the scanning process the document is positioned on a plane perpendicular to the optical axis at a fixed distance from a registration matrix. Thus the homotheticity of the original document and its image is achieved and any insignificant positional defects can be easily detected and corrected. In the scanning process the document is still during exposure thus any image defects related to the original document displacement (such as motion blur) are eliminated. The lighting conditions in the scanner are formed by powerful backlight lamps guaranteeing stable lighting characteristics and the absence of shadows.
Fig. 2. Document positioning in document readers
The special type of scanning hardware are document readers and hardware-software systems using the same image capturing principles as flatbed, planetary, and other general types of image scanners. The document in these devices is either pressed against the glass or inserted into a slit (see Fig. 2) which virtually eliminates any deformations of the scanned document page.
Specialized document readers allow the document image to be captured in different lighting (white, infrared, ultraviolet, etc.). The scheme using white and infrared lighting can be used for optical recognition since it yields a high-contrast image with a low level of noises and disturbances related to the textured background and document security elements.
The known positional relationship of the lighting elements (lamp, light-emitting diodes) and the document scanning surface allows to diminish (in the hardware designing process) or significantly simplify (in the process of scanning) the compensation of highlights of flares.
Depending on the hardware model the scanning devices are capable of capturing images with resolution from 200 DPI and higher. The majority of hardware modifications are capable of capturing images with resolution sufficient for optical text recognition (300-400 DPI).
Thus the specialized scanning devices provide high-quality images with minimal distortions which allows the common OCR techniques (projection histogram analysis for line and symbol segmentation, template or shape-based character recognition, etc.) to perform with high precision and reliability.
MRZ recognition on mobile devices
3.1. General problems
Document capture with mobile devices suffers from standard camera-based optical recognition systems problems. As opposed to scanning devices the optical scheme of a mobile device camera is more complicated and introduces in itself more distortions due to aberrations, flares, and reflections inside the optical system. The use of photosensors (matrices) and analog electronics by the image registration devices inevitably leads to image distortions called the digital noise. The sources of digital noise are the analog signal digitization process (signal quantization errors, thermal noise, and charge transfer on a matrix) and its amplification. The digital noise is visible on the image as a mask of pixels with random color and intensity. The noise is more visible on the uniformly colored image regions and especially on darker regions. In contrast to the scanning process, where the high-quality lighting is guaranteed, in the process of capturing the image using a mobile device camera the poor lighting conditions frequently occur leading to the increasing effects of the digital noise (see Fig. 3). Other sources of image distortions are the image compression algorithms, particularly in regard to frames of the video stream.
Fig. 3. Examples of distorted MRZ character images
Depending on the lens characteristics and the position of the document relative to the focus plane the whole document image or its regions may be blurred. If the movement of the document or camera occurs at the time of exposure the motion blur occurs, amplified in the bad lighting conditions (see Fig. 4).
Fig. 4. Examples of blurred MRZ character images
3.2. Document deformation
Unlike the scanned documents in the process of capturing using the mobile device camera the document is positioned in an arbitrary plane relative to the focusing plane. The deviation from the plane perpendicular to the optical axis leans to projective distortion of the document image. If the deviation angle is small the MRZ can be recognized without additional projective rectification but in the general case it is required to estimate the parameters of the projective basis and perform the OCR on the projectively rectified image. The projective basis estimation errors are possible and will lead to geometrical distortions of symbol images. Furthermore, as a real-world object the original document is prone to mechanical deformations. For example, paper documents are prone to bends and twists (more commonly along the main text direction or perpendicular to it). In the process of capturing with a mobile camera it is hard or virtually impossible to eliminate this sort of deformations (see Fig. 5).
Fig. 5. Non-linear deformation of MRZ lines after projective restoration
Mechanic document deformation combines with projective distortion of the document image. After projective normalization symbols may not be aligned in parallel lines (as on the original document). Even after correct projective rectification of the whole document the image of a symbol from a physically deformed region of the document will differ from the image of the same symbol from an undeformed region (see Fig. 6).
Fig. 6. Examples of MRZ characters with projective and non-linear deformations
3.3. Background and lighting
ICAO 9303 document specifies the printing standards of the machine-readable documents as follows: To combat the threat to travel document security posed by the use of items such as photocopiers, security features are permitted in the MRZ, and any such security feature shall not interfere with an accurate reading of the OCR characters at the B900 range, as defined in ISO 1831. While OCR characters must be visible, to ensure that all machine-readable passports, including those with security features in the MRZ, can be successfully read, the OCR characters in the MRZ shall be machine-readable only in the near-infrared portion of the spectrum (i.e. the B900 band defined in ISO 1831).
Thus the contrast requirements for the MRZ symbols are specified only for the near-infrared portion of the spectrum. In practice that allows issuing authorities to print the MRZ on a textured background invisible in the near-infrared range but quite dense in the optical range (see Fig. 7).
Fig. 7. Examples of zones with a textured background in the optical range
For mobile device cameras capturing in the near-infrared range is impossible (in the general case at least) thus textured background makes MRZ OCR process significantly harder, especially in poor lighting conditions.
The lighting scheme in the scanning devices minimizes the occurrence of shadows and flares even for glossy (laminated) document pages. On the other hand using the mobile device camera in natural scenes leads to lighting effects occurring such as reflections, shadows, and color distortions making the image analysis and recognition harder due to, for example, the loss of existent and emergence of false object boundaries. Pages of the documents containing the MRZ are commonly made of a special plastic or covered with protective film and thus have significant reflective properties (see Fig. 8, left). Additionally, protective document elements often include holographic elements (see Fig. 8, right) also distorting the image.
Fig. 8. Fragments of the MRZ; Left: highlight from a prolonged light source; Right: holographic security elements
3.4. OCR-B font issues
ICAO 9303 document specifies a particular subset of OCR-B fonts for use in MRZ documents. This font family has several symbols which have a similar appearance on low-resolution images under the distortions related to mobile device cameras (see Fig. 9‑12).
Fig. 9. Example OCR-B characters
It is particularly hard to distinguish the letter ‘O’ and the digit ‘0’, because their appearances differ only slightly in proportions and curvature. Small difference of appearances even in the conditions of faint distortions or moderately low resolution makes it hard even for a human to distinguish on the photo or video frame captured from a mobile device camera.
Fig.10. Examples of hardly distinguishable digit ‘0’ (left) and the letter ‘O’ (right)
Under the conditions of low-quality capture, especially with noises and flares, the triple ‘M’ – ‘N’ – ‘H’ becomes hardly distinguishable due to their characteristic differences laying inside the small central symbol region.
Fig. 11. Examples of poorly distinguishable characters – ‘M’ – ‘N’ – ‘H’
Small distortions leading to relative symbol shifts, blurring, or textured background may lead to the indistinguishability of images of symbols ‘1’ – ‘T’ – ‘I’.
Fig. 12. Examples of poorly recognizable characters ‘1’ – ‘T’ – ‘I’
Thus when dealing with images captured from mobile devices’ cameras generally it is hard to guarantee the high quality of symbol image. This leads to considerably lower precision and reliability of symbol recognition results and the role of the mechanisms of statistical correction of the field recognition results becomes more substantial (in comparison with the recognition systems based on scanners).
MRZ language model
Modern systems designed for recognition and identification of structured documents use various mechanisms of statistical correction in order to increase the recognition precision. These mechanisms utilize the information about the document structure, recognition ‘context’, and employ the language model of a document or of a recognized field. There are various known algorithms of statistical correction, or recognition post-processing, and they are based on a group of related methods such as Hidden Markov Models, Finite-State Automata, N-gram languages, and dictionary-based methods, and methods based on the properties of Weighted Finite-State Transducers.
Let’s consider a text field F. In the scope of the document the field F possesses a semantic structure (semantic properties). In the scope of the document representation the field F possesses a syntactic structure as well. Based on the document’s semantics and a syntactic structure of the document’s representation we can define a language model for F.
Consider this example: let F be a ‘birth date’ field of an MRZ on a passport, compliant with ICAO 9303 standard. Semantics of F dictates that it should contain information about the year, month, and day the holder of the document was born. In the MRZ data structure of a machine-readable passport, a fixed position is allocated for F (14-19 symbols of the second MRZ line, with the checksum in 20-th symbol) and its syntactic structure is defined: the date is represented in the format YYMMDD, where YY – the last two decimal digits of the year, MM – decimal representation of the month number, and DD – decimal representation of the day number. If parts of the birth date is unknown the relevant character positions are completed with filler characters (‘<‘). Checksum is a single decimal digit and it is calculated using an algorithm specified in ICAO 9303 document. Now, based on the semantics of F and its syntactic structure we can define a language model that will incorporate the set of possible field values. This language model can be represented in different ways, such as BNF-grammar, or as a regular language coded as a finite-state automaton. One of the ways of representing the language model is a validating grammar G consisting of the symbol alphabet, the set of all possible strings composed of symbols from this alphabet, and the membership predicate P. The word S from the set of all possible strings fits the language model G if the predicate P is true on S. Since the ICAO 9303 document specifies the restrictions on fields’ values (thus strengthening the predicate P) and introduces the checksum mechanism, the use of validating grammar as a way of representing the MRZ field language model is justified.
The task of statistical correction (post-processing) of the recognition result for field F given the validating grammar G can be formulated as follows: in the weighted set of all recognition alternatives for the field F find a result with maximal weight for which the predicate P is true. If the set of all possible values for F is finite (e.g. the field’s length is limited) then we can define a ‘context strength’ as a ratio of P’s falsity region cardinality to the cardinality of the set of all possible values for F. The more this ratio is the ‘stronger’ the field’s context is and the probability of successful recognition result correction is higher. For example, of all possible words with length 7, consisting of decimal digits, fewer than 0.4% are valid dates (with checksum considered), so the context strength for these types of dates is more than 99.6% (percentage-wise).
Let us now consider elements of the MRZ structure in relation to language model post-processing mechanisms.
4.1. Document code
Document code field is a two-symbol identifier of the type of the MRZ document. It is placed in the very beginning of the first line of the MRZ, regardless of the type, and its first symbol alphabet is strictly fixed (‘P’ for passports, ‘V’ for visas, ‘A’, ‘C’, or ‘I’ for other travel documents), which allows the fairly reliable recognition result correction procedure for this symbol to be built. The second symbol, however, is left at the discretion of the issuing organization. Since the composite checksum of the MRZ document does not cover the document code field, the language model (besides the general MRZ alphabet restriction) for the second symbol of the document code cannot be built in the general case. It is also worth noting that several organizations exist, issuing documents with MRZ-like structure (but not strictly compliant with ICAO 9303 standard). In these documents the first symbol of the document code field may not comply with the ICAO 9303 specifications.
4.2. Issuing authority and nationality
The fields ‘issuing state/authority’ and ‘nationality’ define respectively the unique code of the issuing organization and the nationality of the MRZ document holder. The codes are based on the three-letter state codes according to ISO 3166-1 standard with several extensions (added codes for specific international organizations authorized to issue documents with MRZ, and special codes for persons without defined nationality). The language model for both fields can be represented as a simple dictionary – i.e. the simple set of valid three-letter codes. The ratio of valid codes from all possible three-letter words is ~1.4%, thus the context strength of this language model is fairly high – ~98.6% (percentage-wise).
4.3. Name field
The name field is perhaps one of the most difficult fields from the perspective of international standardization, given the variety of names structure in different countries and languages. In the ICAO 9303 document several requirements of name field formatting are specified which could be used for basic validity checks: the name field consists of one or two sections separated by two filler characters (‘<’), each section may consist of one or several words separated by one filler character. Each word may consist only of the letters of Latin alphabet. There are no additional validation mechanisms specified in ICAO 9303 document (composite checksum of the MRZ document does not cover the name field). For the name field the known methods can be used for recognition results post-processing, such as N-gram models and dictionary-based methods.
4.4. Document number and personal number
Document number and personal number (optional data) fields have a non-strict syntactic structure, and the task of building an effective mechanism for their statistical correction is quite difficult. The alphabet of these fields is not restricted (apart from the general MRZ alphabet restriction). For document number field there is a weak recommendation, according to which the number should not contain filler characters at the beginning and in the middle of the field (i.e. the number should be expanded with filler characters until the needed field length is reached, but all filler characters should be placed at the end of the field). But the syntactic structure of the personal number field is left to the discretion of the issuing organization. Both fields has their checksums, but even using the checksum the efficiency of the post-processing mechanism is not high enough: since the alphabet contains both letters and digits the efficiency of post-processing falls due to the way the ICAO 9303 checksum is computed, as will be stated below. Context strength for both fields can be increased using the recognition results of other fields, such as issuing authority code. Some issuing organizations define their own syntactic structure for document number and personal number fields. Thus, after the recognition result of the issuing authority code field is acquired and corrected, the syntactic structure of document number and personal number fields can be refined if the restrictions specified by the particular issuing organization is known in advance.
4.5. Birthdate and expiry date
The syntactic structure of birth date and expiry date fields is described above as an example of the context strength definition. These fields are perhaps the most convenient in regard to the language model – the alphabet is restricted (only digits, with the exception of filler characters in case of the unknown date parts) and it’s possible to build a highly discriminative language model based on the semantics of the fields. The context strength for both fields can be increased further (regarding multiple-field post-processing algorithm) by using the fact that the expiry date of the document cannot be earlier than the birth date of the document’s holder.
According to ICAO 9303 document the checksums are provided for document number, birth date, expiry date, and personal number fields. There is also a so-called composite checksum (or composite check digit), which allows to additionally validate these four fields, but the composite checksum is not specified for all MRZ document types (there is no composite checksum on machine-readable visa variations MRV-A and MRV-B). Also, despite a common misconception, composite checksum does not validate the whole MRZ zone, only a subset of fields (not including, for example, name field). Checksum occupies a single MRZ symbol for each field and is calculated with the following algorithm:
- Each symbol of the validated field is assigned with its weight. The first symbol has weight 7, the second – 3, the third – 1. The fourth symbol again has weight 7, the fifth – 3, and so on, repeating weights 7, 3, and 1 in cycle.
- The code of each symbol is multiplied by its weight. The code of filler character (‘<’) is zero, the code of each decimal digit is equal to the value of the digit, the code of each letter of the Latin alphabet is equal to 9 plus letter number in the alphabet (‘A’ has code 10, ‘B’ has code 11, and so on. ‘Z’ has code 35).
- The value of the checksum equals the sum of all the products modulo 10.
Since the final sum of the weighted symbol codes is taken modulo 10, there are many collisions occurring. Particular difficulties are created by collisions on pairs of symbols which are poorly distinguishable by symbol recognizers in the conditions of mobile devices (see OCR-B font issues above). For example, symbols from pairs ‘F’ and ‘P’, ‘H’ and ‘R’, ‘G’ and ‘6’, ‘S’, and ‘8’ have the same codes (modulo 10). Such fields as document number and personal number, where both digits and letters may be used, the main validation mechanism is the checksum. So if one of the symbols of these fields and of the given pairs has been incorrectly recognized as the corresponding symbol of the pair, the checksum value stays the same and the probability for this field to be corrected with a post-processing algorithm is greatly reduced.
The weights by which the symbol codes are multiplied can also be the source of post-processing problems. For example, weights 7 and 3 are applied to the adjacent symbols of the field and sum to the amount of 10. That means that two identical adjacent symbols (or different symbols, but with the same code modulo 10) with weights 7 and 3 will give a zero contribution to the checksum, regardless of their values. This, in turn, means that if on the document photo or on a video stream frame there is a local distortion resulting in incorrect recognition of two adjacent symbols (e.g. pair of digits ‘00’ is recognized as pair of letters ‘OO’) and these symbols are in field positions with weights 7 and 3 then the post-processing mechanism based on a checksum-validating grammar will not be able to correct this result. This also mainly concerns document number and personal number as fields with the most extensive alphabet and with the least strict syntactic structure.
In order to increase the reliability of the MRZ document validation mechanism the ICAO 9303 standard introduces composite checksum for some MRZ document types. However, this composite checksum does not cover the entire MRZ document, but only those fields already covered by their respective checksums.
In conclusion, the ICAO 9303 document specifications may be well-designed for the recognition using scanning devices or specialized document readers, but mainly due to the properties of these devices and not because of the thought-out standard. The recognition of MRZ documents on images captured with a mobile device camera often suffers from poor capturing conditions and image distortions which are not always correctable. The use of OCR-B font creates a set of problems due to the similarities between several pairs of symbols in conditions of optical distortions. Other font families such as OCR-A or SEMI M12 could have been more appropriate in this case.
From the point of view of recognition post-processing some MRZ document fields specified by ICAO 9303 standard ensure high discriminative power for the language model and sufficiently strong context. However, for several particular fields, such as document number, personal number, and name of the document holder, a specification of a more strict syntactic structure could help to achieve higher recognition precision and reliability.
The chosen algorithm for checksum calculation and the fact that the composite checksum does not cover the entire document make the task of recognition result validation and post-processing harder, both for systems dealing with images captured from a mobile device camera and for traditional systems based on scanners and specialized document readers.