17.04.2020 г.

Building an OCR bot in 5 minutes using Python

Over the next series of articles we want to show how one can easily build Smart Engines document recognition modules into applications. Today we are going to describe how to work with Smart IDReader’s Python-interface recognition library, and will build a simple Telegram bot.

Please note that the list of programming languages that we support has expanded, and now includes: C ++, C, C #, Objective-C, Swift, Java, Python, and also some esoteric programming languages ​​such as Visual Basic and, of course, PHP. As for the operating systems, we support all of the popular and also many unpopular OSs and architectures; our free application is available for download from the App Store and Google Play.

The demo version of Smart IDReader SDK in Python, along with the source code of our Telegram bot is uploaded to Github and is available via this link.

What do we need?

Firstly, we would need a few files from the SDK, namely:

  • The Python recognition library interface (pySmartIdEngine.py)
  • Dynamic library C ++ core recognition (in case of Linux – _pySmartIdEngine.so)
  • The Configuration archive (* .zip)

To build a Telegram-bot we have used python-telegram-bot.

Interacting with Recognition Engine

As we are going to focus on the most important aspects of the process, you can find the detailed information about the library here.

Let’s connect the library and build a class with our recognition engine and image recognition method:

# Connecting the python recognition library interface

import pySmartIdEngine as se

# Class with recognition engine instance

class SmartIDReaderEngine():
    def __init__(self, config_path):

    # Building the recognition engine

        self.smartid_engine = se.RecognitionEngine(config_path)

#  The image recognition function:

    def recognize_image_file(self, image_file_path):

     # Create session settings and enable document types

        session_settings = self.smartid_engine.CreateSessionSettings()

    # Create recognition session

        session = self.smartid_engine.SpawnSession(session_settings)
       # Recognize image file
        result = session.ProcessImageFile(image_file_path)

    # Convert string fields from recognition results to dictionary

        recognized_fields = {}
       for field_name in result.GetStringFieldNames():
           field = result.GetStringField(field_name)
            recognized_fields[field_name] = field.GetValue().GetUtf8String()

    # Return JSON dictionary as string

        return json.dumps(recognized_fields, ensure_ascii=False, indent=2)

Building the OCR bot

To build an image recognition bot we are going to follow a simple path and use the example from python-telegram-bot documentation. We need to create some MessageHandlers,  which will be called each time when a new image or photo will arrive. To avoid wasting time on creation of recognition engine for each message, we will instead implement and pass the new methods of our SmartIdreaderEngine class as MessageHadlers.

# Implementing new methods

def on_chat_photo(self, update, context):

        # Creating downloads directory and downloading photo

        temp_path = get_photo(update)

        # Recognizing it and sending message with result

        recognition_result_str = self.recognize_image_file(temp_path)
    def on_chat_image(self, update, context):

        # Creating downloads directory and downloading image file

        temp_path = get_image(update)

        # Recognizing it and sending message with result

        recognition_result_str = self.recognize_image_file(temp_path)

Also we need some functions to save images and photos

# Creating downloads directory and downloading photo

def get_photo(update):  
   downloads_dir = 'downloaded_images'
   os.makedirs(downloads_dir, exist_ok=True)
   temp_path = os.path.join(downloads_dir,
  'file_%s_id_%d_temp.png' % (update.message.photo[-1].file_id, update.message.message_id))
    return temp_path

# Creating downloads directory and downloading image file

def get_image(update):
   downloads_dir = 'downloaded_images'
   os.makedirs(downloads_dir, exist_ok=True)
   temp_path = os.path.join(downloads_dir, update.message.document.file_name)
   update.message.document.get_file().download(custom_path = temp_path)
    return temp_path

Now we’re ready to create a bot with recognition functions as message handlers

from telegram.ext import Updater, CommandHandler, MessageHandler, Filters
from argparse import ArgumentParser

# Get arguments with bot token and configuration file path

 parser = ArgumentParser()
   parser.add_argument('--token', type=str, )
   parser.add_argument('--smartid-config', type=str, default='bundle_mock_smart_idreader.zip')
    args = parser.parse_args()

# Create a bot instance

    updater = Updater(args.token, use_context=True)

# Create an instance of SmartIdReader engine

    smartid_engine = SmartIDReaderEngine(args.smartid_config)

# Get the dispatcher to register handlers

    dp = updater.dispatcher

# on photo or image – recognize it with SmartIdReader

    dp.add_handler(MessageHandler(Filters.photo, smartid_engine.on_chat_photo))
    dp.add_handler(MessageHandler(Filters.document.image, smartid_engine.on_chat_image))

Finally, run the bot:

    # Start the Bot

Use your unique token bot instead of args.token, obtained after its registration. In case this is your first time creating a bot please find the detailed instruction of the process form the Telegram website.

And this is it! Now you know how to use the Smart IDReader SDK Python interface to build your own Telegram bot for document recognition.

As a reminder, all our OCR products have a unique feature of being completely autonomous, meaning one can use them without the internet connection. You can also easily recognize the documents remotely using Telegram, however please note that legally you can do that with your personal documents only. In order to work with other people’s documents data, it is necessary to not only have permission on personal data storage and protection, but also have the necessary infrastructure to protect that data, as well as all phones and computers on which the recognition is performed.

And that is why, with the help of our libraries, our fellow colleagues from Sum & Substance have developed a platform for remote recognition and verification of such documents, and took care of the legal part of the matter.

Improve your business with Smart Engines technologies

Identity document scanning

Recognition of ID cards, passports, driver’s licenses, residence permits, visas, and more. Works on a mobile phone or server, on photos and scans, regardless of their quality, as well as in the video stream from a smartphone or web camera, robust to capturing conditions. No data transfer - scanning is performed on-device and on-premise.

Credit cards, barcodes, MRZ scanning

Recognition of data from codified objects. Captures machine-readable zones (MRZ), embossed, indent-printed, and free-template bank cards, PDF417, QR code, AZTEC and other linear and 2D barcodes using a smartphone’s camera, on the fly. Works in mobile applications (on-device) and scans photographs, regardless of lighting conditions.


Document & Form Reading software

Automatic extraction of data from documents (KYC questionnaires, applications, tests, etc), administrative papers (accounting documents, corporate reports, business forms), and government forms (financial statements, insurance policies, etc). Recognizes scans and photographs taken in natural conditions. Total security: only on-premise installation.

Computational Imaging and Tomography

Green AI for Tomographic reconstruction and visualization. Algorithmization of the image reconstruction process directly during the of X-ray tomographic scanning process. We aspire to reduce the radiation dose received during the exposure by finding the optimal termination point of scanning.

Send Request

Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.

    Send Request

    Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.