Smart Document Engine is a multi-platform, stand-alone SDK for recognizing structured documents, standard forms, from bills of payment to acts, invoices and transfer documents.
The following operating systems are supported:
General workflow includes the following stages:
Create a DocEngine instance as follows:
// C++
std::unique_ptr<se::doc::DocEngine> engine(se::doc::DocEngine::Create(
configuration_bundle_path));
// Java
import com.smartengines.doc.*;
DocEngine engine = DocEngine.Create(configuration_bundle_path);
Parameters:
TIP
Disable lazy configuration for:
• server applications for which the first recognition response time is more important than the total memory consumption;;
• measuring the maximum memory consumed by an application
Configuration process might take a while but it only needs to be performed once during the program lifetime. Configured DocEngine is used to spawn sessions which have actual recognition methods.
ATTENTION!
DocEngine::Create() is a factory method and returns an allocated pointer. You are responsible for deleting it.
See more about configuration bundles in Configuration Bundles.
To create DocSessionSettings from configured DocEngine:
// C++
std::unique_ptr<se::doc::DocSessionSettings> settings(
engine->CreateSessionSettings());
// Java
import com.smartengines.doc.*;
DocSessionSettings settings = engine.CreateSessionSettings();
ATTENTION!
DocEngine::CreateSessionSettings() is a factory method and returns an allocated pointer. You are responsible for deleting it.
Enable required document types as shown in the following examples:
// C++
settings->AddEnabledDocumentTypes("deu.*"); // All the documents of Germany
// Java
settings.AddEnabledDocumentTypes("deu.*"); // All the documents of Germany
See more about document types in Specifying document types for DocSession.
A personal signature is provided to the customer when the Smart Document Engine product is delivered. This signature is located in the README.html file in the /doc directory.
Each time a recognition Id Engine session instance is created, the signature must be passed as one of the arguments to the session creation function. It confirms that the caller is authorized to use the library and unlocks the library.
Functionality is checked offline, the library does not access any external resources.
To spawn a DocSession:
// C++
const char* signature = "... YOUR SIGNATURE HERE ..."; //Your personal signature you use to start Smart Document Engine session
std::unique_ptr<se::doc::DocSession> session(
engine->SpawnSession(*settings, signature));
// Java
import com.smartengines.doc.*;
String signature = "... YOUR SIGNATURE HERE ..."; //Your personal signature you use to start Smart Document Engine session
DocSession session = engine.SpawnSession(settings, signature);
ATTENTION!
DocEngine::SpawnSession() is a factory method and returns an allocated pointer. You are responsible for deleting it.
Create a processing settings object as follows:
// C++
std::unique_ptr<se::doc::DocProcessingSettings> proc_settings(
session->CreateProcessingSettings());
// Java
import com.smartengines.doc.*;
DocProcessingSettings proc_settings = session.CreateProcessingSettings();
Create an Image object which will be used for processing:
// C++
std::unique_ptr<se::common::Image> image(
se::common::Image::FromFile(image_path)); // Loading from file
// Java
import com.smartengines.doc.*;
Image image = Image.FromFile(image_path); // Loading from file
ATTENTION!
Image::FromFile() is a factory method and returns an allocated pointer. You are responsible for deleting it.
Register the Image object in the session and set it as a current source
// C++
int image_id = session->RegisterImage(*image);
proc_settings->SetCurrentSourceID(image_id);
// Java
int image_id = session.RegisterImage(image);
proc_settings.Process(proc_settings);
Call Process(…) method for launching the session’s processing routine
// C++
session->Process(*proc_settings);
// Java
session.Process(proc_settings);
ATTENTION!
DocResult::Process() is not a factory method, but the returned result object is not independent. The result object lifetime does not exceed the session lifetime.
Obtain the current result from the session:
// C++
const se::doc::DocResult& result = session->GetCurrentResult();
// Java
import com.smartengines.doc.*;
DocResult result = session.GetCurrentResult();
Use DocResult fields to extract the recognized information:
// C++
// Going through the found documents
for (auto doc_it = result.DocumentsBegin();
doc_it != result.DocumentsEnd();
++doc_it) {
const se::doc::Document & doc = doc_it.GetDocument();
// Going through the text fields
for (auto it = doc.TextFieldsBegin();
it != doc.TextFieldsEnd();
++it) {
// Getting text field value (UTF-8 string representation)
std::string field_value = it.GetField().GetOcrString().GetFirstString().GetCStr();
}
}
// Java
import com.smartengines.doc.*;
// Going through the found documents
for (DocumentsIterator doc_it = result.DocumentsBegin();
!doc_it.Equals(result.DocumentsEnd());
doc_it.Advance()) {
Document doc = doc_it.GetDocument();
// Going through the text fields
for (DocTextFieldsIterator it = doc.TextFieldsBegin();
!it.Equals(doc.TextFieldsEnd());
it.Advance()) {
// Getting text field value (UTF-8 string representation)
String field_value = it.GetField().GetOcrString().GetFirstString().GetCStr();
}
}
// C++
settings->SetOption("enablePDF", "true");
// Java
settings.SetOption("enablePDF", "true");
// C++
se::doc::DocResult& result = session->GetMutableCurrentResult();
// Java
import com.smartengines.doc.*;
DocResult result = session.GetMutableCurrentResult();
// C++
bool pdf_is_available = result.CanBuildPDFABuffer();
// Java
Boolean pdf_is_available = result.CanBuildPDFABuffer();
// C++
result.SetAddTextMode("image_with_text");
// Java
result.SetAddTextMode("image_with_text");
// C++
result.SetAddTextMode("chars");
// Java
result.SetAddTextMode("chars");
// C++
result.BuildPDFABuffer();
// Java
result.BuildPDFABuffer();
// C++
const size_t pdf_size = result.GetPDFABufferSize();
unsigned char* pdfb = new unsigned char[pdf_size];
result.GetPDFABuffer(pdfb, pdf_size);
// Java
int pdf_size = result.GetPDFABufferSize();
byte[] pdfb = new byte[pdf_size];
result.GetPDFABuffer(pdfb);
The basic Smart Document Engine delivery package includes:
The files are arranged in directories as shown in the table below:
Directory | Contents | Description |
secommon | C++ se::common namespace files | Common classes, such as Point, OcrString, Image, etc. See Common classes |
Files of integration, for example, Java com.smartengines.common module (one compiled file) |
||
doc | Documentation | See Code documentation |
samples | Complete compilable and runnable sample usage code | |
data-zip | Bundle files in format: bundle_something.se | Configuration files See Configuration bundles |
Common classes, such as Point, OcrString, Image, etc. are located within se::common namespace and are located within a secommon directory:
For C++ these are such header as:
Header | Description |
#include <secommon/se_export_defs.h> | Contains export-related definitions of Smart Engines libraries |
#include <secommon/se_exceptions_defs.h> | Contains the definition of exceptions used in Smart Engines libraries |
#include <secommon/se_geometry.h> | Contains geometric classes and procedures (Point, Rectangle, etc.) |
#include <secommon/se_image.h> | Contains the definition of the Image class |
#include <secommon/se_string.h> | Contains the string-related classes (MutableString, OcrString, etc.) |
#include <secommon/se_string_iterator.h> | Contains the definition of string-targeted iterators |
#include <secommon/se_serialization.h> | Contains auxiliary classes related to object serialization (not used in Smart Document Engine) |
#include <secommon/se_common.h> | This is an auxiliary header which simply includes all of the above |
The same common classes in Java API are located within com.smartengines.common module:
// Java
import com.smartengines.common.*; // Import all se::common classes
The main Smart Document Engine classes are located within se::doc namespaces and are located within an docengine directory:
Header | Description |
#include <idengine/doc_document_info.h> | Provides information about the document type (textual document description) |
#include <docengine/doc_engine.h> | Contains docengine class definition |
#include <docengine/doc_session_settings.h> | Contains DocSessionSettings class definition |
#include <docengine/doc_session.h> | Contains DocSession class definition |
#include <docengine/doc_video_session.h> | Contains DocVideoSession class definition |
#include <docengine/doc_processing_settings.h> | Contains DocProcessingSettings class definition |
#include <docengine/doc_result.h> | Contains DocResult class definition, as well as DocTemplateDetectionResult and DocTemplateSegmentationResult |
#include <docengine/doc_document.h> | Contains Document class definition |
#include <docengine/doc_documents_iterator.h> | Contains documents related iterators |
#include <docengine/doc_fields.h> | Contains the definitions of classes representing Smart Document Engine fields |
#include <docengine/doc_fields_iterator.h> | Contains fields related iterators |
#include <docengine/doc_feedback.h> | Contains the DocFeedback interface and associated containers |
#include <docengine/doc_external_processor.h> | Contains the external document processing interface |
#include <docengine/doc_graphical_structure.h> | Contains DocGraphicalStructure class definition |
#include <docengine/doc_tags_collection.h> | Contains DocTagsCollection class definition |
#include <docengine/doc_view.h> | Contains DocView class definition |
#include <docengine/doc_views_iterator.h> | Contains DocView (document images) related iterators |
#include <docengine/doc_views_collection.h> | Contains DocViewsCollection class definition |
#include <docengine/doc_basic_object.h> | Contains DocBasicObject class definition |
#include <docengine/doc_basic_objects_iterator.h> | Contains DocBasicObject (basic document objects) related iterators |
#include <docengine/doc_objects.h> | Contains definitions of graphical object classes |
#include <docengine/doc_objects_collection.h> | Contains DocObjectsCollection class definition |
#include <docengine/doc_objects_collection_iterator.h> | Contains DocObjectsCollection-related iterators |
#include <docengine/doc_forward_declarations.h> | Service header containing forward declarations of all classes |
The same classes in Java API are located within com.smartengines.doc module:
// Java
import com.smartengines.doc.*; // Import all se::doc classes
All the classes, their methods, the methods options and options values are described both in comments that are converted into docengine.pdf included into the documentation directory.
The documentation is available at doc directory. The doc directory structure:
The C++ API may throw se::common::BaseException subclasses when the user passes invalid input, makes bad state calls or if something else goes wrong.
The following exception (se::common::BaseException) subclasses are implemented:
Exception name | Description |
FileSystemException | Thrown if an attempt is made to read from a non-existent file, or other file system related IO error |
internalException | Thrown if an unknown error occurs or if the error occurs within internal system components |
InvalidArgumentException | Thrown if a method is called with invalid input parameters |
InvalidKeyException | Thrown if to an associative container the access is performed with an invalid or a non-existent key, or if the access to a list is performed with an invalid or out-of-range index |
InvalidStateException | Thrown if an error occurs within the system in relation to an incorrect internal state of the system objects |
MemoryException | Thrown if an allocation is attempted with insufficient RAM |
NotSupportedException | Thrown when trying to access a method which given the current state or given the passed arguments is not supported in the current version of the library or is not supported at all by design |
Uninitialized Object Exception | Thrown if an attempt is made to access a non-existent or non-initialized object |
Exceptions contain useful human-readable information. Please read e.what() message if exception is thrown.
Note
se::common::BaseException is not a subclass of std::exception.
A Smart ID Engine interface does not have any dependency on the STL
The thrown exceptions are wrapped in general java.lang.Exception. In Java, the exception type is included in the corresponding message text.
If you face a problem, or contact us at sales@smartengines.com or support@smartengines.com.
Several Smart Document Engine SDK classes have factory methods which return pointers to heap-allocated objects. Caller is responsible for deleting such objects (a caller is probably the one who is reading this right now).
TIP
In C++:
For simple memory management and avoiding memory leaks, use smart pointers, such as std::unique_ptr<T> or std::shared_ptr<T>.
In Java API:
For the objects which are no longer needed it is recommended to use the .delete() method to force the deallocation of the native heap memory.
Every delivery contains one or several configuration bundles – archives containing everything needed for Smart Document Engine to be created and configured. Usually they are named as bundle_something.se and located inside the data-zip directory.
A document type is simply a string encoding real world document type you want to recognize. Document types that Smart Document Engine SDK delivered to you can potentially recognize can be obtaining using the following procedure:
// C++
// Iterating through internal engines
for (int i_engine = 0;
i_engine < settings->GetInternalEnginesCount();
++i_engine) {
// Iterating through supported document types for this internal engine
for (int i_doc = 0;
i_doc < settings->GetSupportedDocumentTypesCount(i_engine);
++i_doc) {
// Getting supported document type name
std::string doctype = settings- >GetSupportedDocumentType(i_engine, i_doc);
}
}
// Java
// Iterating through internal engines
for (int i_engine = 0;
i_engine < settings.GetInternalEnginesCount();
i_engine++) {
// Iterating through supported document types for this internal engine
for (int i_doc = 0;
i_doc < settings.GetSupportedDocumentTypesCount(i_engine);
i_doc++) {
// Getting supported document type name
String doctype = settings.GetSupportedDocumentType(i_engine, i_doc);
}
}
ATTENTION!
In a single session you can only enable document types that belong to the same internal engine.
Since all documents in settings are disabled by default you need to enable some of them. In order to do so you may use AddEnabledDocumentTypes(…) method of DocSessionSettings:
// C++
settings->AddEnabledDocumentTypes("usa.forms.fed.ss4.type1"); // Enables the form of the application for an employer identification number of the USA
// Java
settings.AddEnabledDocumentTypes("usa.forms.fed.ss4.type1"); // Enables the form of the application for an employer identification number of the USA
You may also use the RemoveEnabledDocumentTypes(…) method to remove already enabled document types.
For convenience it’s possible to use wildcards (using the asterisk symbol) while enabling or disabling document types. When using document types related methods, each passed document type is matched against all supported document types. All matches in supported document types are added to the enabled document types list.
// C++
settings->AddEnabledDocumentTypes("deu.*"); // Enables all supported documents of Germany
// Java
settings.AddEnabledDocumentTypes("deu.*"); // Enables all supported documents of Germany
ATTENTION!
You can only enable document types that belong to the same internal engine for a single session. If you do otherwise then an exception will be thrown during session spawning.
TIP
It’s always better to enable the minimum number of document types as possible if you know exactly what you are going to recognize because the system will spend less time deciding which document type out of all enabled ones has been presented to it.
Some configuration bundle options can be overridden in runtime using DocSessionSettings methods. You can obtain all currently set option names and their values using the following procedure:
// C++
for (auto it = settings->OptionsBegin();
it != settings- > OptionsEnd();
++it) {
// it.GetKey() returns the option name
// it.GetValue() returns the option value
}
// Java
for (StringsMapIterator it = settings.OptionsBegin();
!it.Equals(settings.OptionsEnd());
it.Advance()) {
// it.GetKey() returns the option name
// it.GetValue() returns the option value
}
You can change option values using the SetOption(…) method:
// C++
settings->SetOption("enableMultiThreading", "true");
// Java
settings.SetOption("enableMultiThreading", "true");
Option values are always represented as strings, so if you want to pass an integer or boolean it should be converted to string first.
Option name | Value type | Default | Description |
enableMultiThreading | “true” or “false” | “true” | Enables parallel execution of internal algorithms |
rgbPixelFormat | String of characters R, G, B, and A | “RGB” for 3-channel images, “BGRA” for 4-channel images | Sequence of color channels for session.Process() method image interpretation |
Smart Document Engine SDK has Java API which is automatically generated from C++ interface using the SWIG tool.
The Java interface is the same as C++ except minor differences, please see the provided Java sample.
Even though garbage collection is present and works, it’s strongly advised to call obj.delete() functions for our API objects manually because they are wrappers to the heap-allocated memory and their heap size is unknown to the garbage collector, which may result in delayed deletion of objects and thus in high overall memory consumption.
DocEngine engine = DocEngine.Create(config_path); // or any other object
// ...
engine.delete(); // forces and immediately guarantees wrapped C++ object deallocation
You can install demo apps from Apple Store and Google Play from the links below:
You need samples from docengine_sample directory.
We provide the classic SDK as a library and its integration samples, you need to install one of the examples (you can find it in the /samples directory)
Supported formats:
We do not support PDF and recommend rasterizing it (converting it to a supported format) on our side before recognition.
You need to replace the libraries from /bin, bindings (from /bindings respectively) and configuration bundle (*.se file from /data-zip) You can find it in the provided SDK.
Our SDK is platform-dependent, so please contact us at sales@smartengines.com or support@smartengines.com and we will provide you with the required SDK.
Please don’t run SDK for operating systems not intended for it and contact us at sales@smartengines.com or support@smartengines.com to provide you with the required SDK.
Maybe, the signature you have specified is invalid.
The bundle does not contain the specified documents mask or the specified mask matched with documents from multiple internal engines in the current mode. See Configuration bundles.
You cannot use a bundle version different from the library version. The files from the same SDK should be used.
Many integrations require additional assembly of the wrapper to work with the C++ library. This error occurs when the wrapper cannot find the main library at the specified path. It must either be placed in the scope of the code through environment variables, or the wrapper must be compiled with the correct paths to the library.
The module you are using is built for a different version of python, /samples/docengine_sample_*/ contains a script for building the module on your side. Don’t forget that you must have the dev packages installed for your language.
Send Request
Please fill out the form to get more information about the products,
pricing and trial SDK for Android, iOS, Linux, Windows.