Smart Engines is a science company with a focus on AI-related research. Our team of internationally respected experts works at the forefront of document recognition, computational imaging and tomography. The company brings together more than 60 scientists, including 18 PhDs.
Our core developers are all faculty members of the Moscow Institute of Physics and Technology and professors in the Department of cognitive technologies. The Department itself is headed by Professor Vladimir L. Arlazarov – Director of Science at Smart Engines. Our research papers are published by the world’s most respected scientific journals and we regularly participate in leading international conferences such as ICDAR, ICIP and ICMV. Smart Engines employees are members of IAPR (International Association for Pattern Recognition) and IEEE.
In 2017, Smart Engines won ICDAR’s Document Image Binarization Competition. That followed a previous third place at the same organization’s 2015 smartphone document capture contest.
In 2019, Smart Engines team created MIDV (Mobile Identity Document Video) public dataset with more than 500 video clips for creating and testing ID scanning technologies based on machine learning.
In 2020, Smart Engines introduced an advanced AI technology that tackles the issue of radiation dose receiving while using computerized x-ray imaging procedure for medical purposes.
RESEARCH & DEVELOPMENT
Document recognition technologies
When recognizing documents in a video stream and in photos we must deal with uncontrolled capturing conditions and unknown hardware parameters. In order to scan documents directly on a mobile device, the key aspect of the algorithms is their computational complexity and the footprint of the configuration modules, including the sizes of neural network models. The relevance of the entered data generally depends on the input data and thus on the user, which means the recognition systems and algorithms must be highly failure-resistant. Smart Engines applies deep scientific thinking to create OCR technologies that overcome common challenges to document recognition in video streams, scans, and photos.
Computational Imaging and Tomography
As one of the most powerful non-destructive testing methods in the optical range, machine vision remains limited to the surface of an object. Smart Engines specialises in the scientific development and practical deployment of computed tomography software that allows the exploration of three- dimensional internal structures – essential for medicine, industrial diagnostics, and scientific laboratories.
- Calibration and alignment of new generation tomographs.
- Optimised and customised reconstruction of images from data collected in difficult conditions such as ultra-low doses, tomosynthesis, the presence of highly absorbing inclusions in the object, etc.
- Computational visualisation with automatic processing and semantic analysis of results.
- «Datasets of ID documents: MIDV-500»
How to test ID recognition algorithms?
- «Binarization Algorithms for Documents Recognition»
During the 9-year history of International Competition on Document Binarization DIBCO17 held within ICDAR conference, a lot of bold and unconventional algorithms of binarization have been proposed.
- «ProLAB: Perceptually Uniform Projective Colour Coordinates System»
Aiming at advancing the fundamental science, Smart Engines R&D team is closely working with the research group from the Russian Academy of Sciences — specifically the Institute for Information Transmission Problems.
- Accelerated FBP for computed tomography image reconstruction / A. Dolmatova, M. Chukalina, D. Nikolaev // IEEE ICIP 2020, Washington, DC, United States, IEEE Computer Society, 2020, DOI: 10.1109/ICIP40778.2020.9191044
- Vanishing Point Detection with Direct and Transposed Fast Hough Transform inside the neural network / A. Sheshkus, A. Chirvonaya, D. Matveev, D. Nikolaev, V. L. Arlazarov // Computer Optics, vol. 44, no 5, pp. 737-745, 2020, DOI: 10.18287/2412-6179-CO-676
- Machine-Readable Zones Detection in Images Captured by Mobile Devices’ Cameras / S. I. Kolmakov, N. S. Skoryukina, V. V. Arlazarov // Pattern Recognition and Image Analysis, vol. 30, no 3, pp. 489-495, 2020, DOI: 10.1134/S105466182003013X
- Houghencoder: neural network architecture for document image semantic segmentation / A. V. Sheshkus, D. P. Nikolaev, V. L. Arlazarov // IEEE ICIP 2020, Washington, DC, United States, IEEE Computer Society, 2020, pp. 1-5, 2020, DOI: 10.1109/ICIP40778.2020.9191182
- Monitored Reconstruction: Computed Tomography as an Anytime Algorithm / K. Bulatov, M. Chukalina, A. Buzmakov, D. Nikolaev, V. V. Arlazarov // IEEE Access, vol. 8, pp. 110759-110774, 2020, DOI: 10.1109/ACCESS.2020.3002019
- Two-step CNN framework for text line recognition in camera-captured images / Yulia S. Chernyshova, Alexander V. Sheshkus, Vladimir V. Arlazarov // IEEE Access, 2020 DOI: 10.1109/ACCESS.2020.2974051
- HoughNet: neural network architecture for vanishing points detection / A. Sheshkus, A. Ingacheva, V. Arlazarov, D. Nikolaev // IEEE, 2019 International Conference on Document Analysis and Recognition (ICDAR) DOI: 10.1109/ICDAR.2019.00140
- Fast Method of ID Documents Location and Type Identification for Mobile and Server Application / Natalya Skoryukina, Vladimir V. Arlazarov, Dmitry P. Nikolaev // IEEE, 2019 International Conference on Document Analysis and Recognition (ICDAR) DOI: 10.1109/ICDAR.2019.00141
- Special Aspects of Matrix Operation Implementations for Low-Precision Neural Network Model on the Elbrus Platform / E.E. Limonova, M.I. Neiman-zade, V.L. Arlazarov // Bulletin of the South Ural StateUniversity. Ser. Mathematical Modelling, Programming & ComputerSoftware (Bulletin SUSU MMCS), 2020, vol. 13, no. 1, pp. 118–128 DOI: 10.14529/mmp200109
- Calculation of a Vanishing Point by the Maximum Likelihood Estimation Method / I.A. Konovalenko, J.A. Shemiakina, I.A. Faradjev // Bulletin of the South Ural StateUniversity. Ser. Mathematical Modelling, Programming & ComputerSoftware (Bulletin SUSU MMCS), 2020, vol. 13, no. 1, pp. 107–117 DOI: 10.14529/mmp200108
- Fast X-Ray Sum Calculation Algorithm for Computed Tomography Problem / K.B. Bulatov, M.V. Chukalina, D.P. Nikolaev // Bulletin of the South Ural StateUniversity. Ser. Mathematical Modelling, Programming & ComputerSoftware (Bulletin SUSU MMCS), 2020, vol. 13, no. 1, pp. 95–106 DOI: 10.14529/mmp200107
- MIDV-2019: Challenges of the modern mobile-based document OCR / Konstantin Bulatov, Daniil Matalov, Vladimir V. Arlazarov // Proc. SPIE, Twelfth International Conference on Machine Vision (ICMV 2019) DOI: 10.1117/12.2558438
- Transfer of a high-level knowledge in HoughNet neural network / Alexander V. Sheshkus, Dmitry Nikolaev // Proc. SPIE, Twelfth International Conference on Machine Vision (ICMV 2019) DOI: 10.1117/12.2559454
- Bipolar Morphological Neural Networks: Convolution Without Multiplication / E. Limonova, D. Matveev, D. Nikolaev, V.V. Arlazarov // Proc. SPIE, Twelfth International Conference on Machine Vision (ICMV 2019) DOI: 10.1117/12.2559299
- Using Special Text Points in the Recognition of Documents / Oleg A. Slavin // Cyber-Physical Systems: Advances in Design & Modelling. Studies in Systems, Decision and Control, vol 259. Springer, Cham DOI: 10.1007/978-3-030-32579-4_4
- U-Net-bin: hacking the document image binarization contest / P.V. Bezmaternykh, D.A. Ilin, D.P. Nikolaev // Computer Optics. – 2019. – Vol. 43(5). – P. 825-832. DOI: 10.18287/2412-6179-2019-43-5-825-832
- A Method to Reduce Errors of String Recognition Based on Combination of Several Recognition Results with Per-Character Alternatives / K.B. Bulatov // Bulletin of the South Ural StateUniversity. Ser. Mathematical Modelling, Programming & ComputerSoftware (Bulletin SUSU MMCS), 2019, vol. 12, no. 3, pp. 74–88 DOI: 10.14529/mmp190307
- On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model / K. Bulatov, N. Razumny, V.V. Arlazarov // International Journal on Document Analysis and Recognition (IJDAR) – 2019. – Vol. 22(3). – P. 303-314. DOI: 10.1007/s10032-019-00333-0
- MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream / V.V. Arlazarov, K. Bulatov, T. Chernov, V.L. Arlazarov // Computer Optics. – 2019. – Vol. 43(5). – P. 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824
- Performance Evaluation of a Recognition System on the VLIW Architecture by the Example of the Elbrus Platform / E.E. Limonova, N.A. Bocharov, N.B. Paramonov, D.S. Bogdanov, V.V. Arlazarov, O.A. Slavin, D.P. Nikolaev // Programming and Computer Software – 2019 . – Vol. 45(1). – P. 12-17. DOI: 10.1134/S0361768819010055
- Effective real-time augmentation of training dataset for the neural networks learning / Alexander V. Gayer, Yulia S. Chernyshova, Alexander V. Sheshkus // Proc. SPIE, Eleventh International Conference on Machine Vision (ICMV 2018) DOI: 10.1117/12.2522969
- 2D art recognition in uncontrolled conditions using one-shot learning / N.S. Skoryukina, D.P. Nikolaev, V.V. Arlazarov // Proc. SPIE, Eleventh International Conference on Machine Vision (ICMV 2018) DOI: 10.1117/12.2523017
- Fast Hamming distance computation for 2D art recognition on VLIW-architecture in case of Elbrus platform / Elena Limonova, Natalya Skoryukina, Murad Neiman-zade // Proc. SPIE, Eleventh International Conference on Machine Vision (ICMV 2018) DOI: 10.1117/12.2523101
- Convolutional Neural Network Structure Transformations for Complexity Reduction and Speed Improvement / E. Limonova, A. Sheshkus, A. Ivanova, D. Nikolaev // Pattern Recognition and Image Analysis – 2018. – Vol. 28(1). – P. 24-33. – DOI: 10.1134/S105466181801011X.
DATASETS OF ID DOCUMENTS: MIDV
Insufficient public datasets prevent the comprehensive study of AI-powered document recognition on mobile devices. Existing datasets can be useful for certain tasks, but they are not enough to create and test the technologies required for recognizing identity documents. Until now. The Smart Engines Mobile Identity Document Video dataset (MIDV-500) consists of 500 video clips covering 50 different identity document types. Since this kind of document contains personal data, all source material is either publicly available or copyright-free.
An extension of MIDV-500, the MIDV-2019 dataset contains additional video clips, shot with modern high-resolution mobile cameras, to address issues of projective distortions and variable lighting conditions.
The new dataset of the MIDV family called MIDV-2020 consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset with variable artificially generated data.