25.05.2020 г.

Leak as a service: what you ought to know about data privacy

The philosophy

We live in the age of services – the world economy is entirely built on them today. Billions of people around the world are involved in the service industry: individuals are using services of various companies on a day-to-day basis, and companies, in turn, outsource some of the supporting functions to third-party organizations to optimize costs, and reduce taxes. Cleaning services, legal support, HR are only a few functions that get passed on to third-party contractors, and the decision to transfer highly specialized processes to professionals in their field seems reasonable indeed: their services are often much cheaper than maintaining full-time specialized in-house employees.

The growing effectiveness of technologies and digitalization has led to companies from various industries massively progressing to IT. Today, financial services, insurance services, and medical services seem inconceivable without the developed IT infrastructure and automated data processing. Not every company can afford a data center of its own, thus storing data on remote servers, or in the cloud, – outsourcing in other words – is quite a common practice. The problem here though, is that even if the cloud storage is, in some way, legally connected to the company that is renting server capacities, geographically its servers can be located anywhere; and in that supply chain, the link between the remote storage provider and the end customer (bank, insurance, medical service, and legal agencies client) is very weak. The transfer of data to third-party companies for storage and processing has resulted in a wave of leaks of sensitive information, which has gained momentum in 2018 and continues to this day. Moreover, those leaks involved data that “belonged” to some large and well-known IT companies.

We would like to emphasize that we do not claim that cloud storage is the only source of leaks. We just wish to indicate that remote resources (=services) represent an additional link in the chain of information storage and processing, and therefore, in the absence of proper security protection, that information becomes much more vulnerable and thus accessible for attackers.

In our view, despite all the conveniences that come with using cloud services and renting remote server capacities, when it concerns personal data of customers, the best choice for companies that are dealing with sensitive information would be to develop an infrastructure of their own. Yes, this requires additional finances and highly qualified personnel, but it’s worth it. In this case, even if data leaks out, at least it will be easier for the company to track the leak pathway and take appropriate measures to prevent negative publicity, which could hurt the company’s reputation and finances.

The technology

The technology race is forcing organizations to make their services more intuitive and convenient for the user. Take voice assistants, for example: Google, Cortana, Alexa, Siri – to help us, they have to know about us more than we do ourselves. These technologies help us though actually collecting huge sets of our data.

If we turn to our area of expertise (which is recognition of various types of identity documents), we can see here that the same is done by software solutions and remote services that help to automatically prefill various registration forms. They make the onboarding process much easier, helping us buy travel tickets, get insurance or a bank loan in a matter of seconds. From the security point of view though, it is important to differentiate a data recognition technology that works autonomously on the end user’s device, from a service that processes personal data on its servers located goodness knows where and that might store images of those recognized id documents. In case there is no well-build data protection system in the company’s IT infrastructure, the personal data involved in the remote identification automatically becomes vulnerable. It can be stolen during its transfer for processing or during its storage in the external server where the recognition takes place.

The reality today is that most images (scans and photos) of identity documents (passports, ID cards, vehicle documents, driver’s licenses, bank cards, medical and social insurance policies, diplomas and agreements) end up on dark-net. The reason that those images could at some point have been stored unsecurely on some remote servers, unencrypted, easy to be stolen and then sold, seems legitimate.  Thus, unprotected servers which “help” us with remote recognition of documents can be acting as suppliers of identity document images for the black market, which paints a rather alarming picture:

  1. The outsourcing of customers’ identity documents recognition to third-party providers which have remote servers, is the same as entrusting recalculation of your cash from the safe deposit box to your third party cleaning personnel.
  2. No one knows how that particular remote recognition service that you hired works: it was still fairly recently that instead of artificial intelligence and machine vision remote services used human work, so-called “recognition factories” located in countries with extremely low labor costs and involved people that were manually inputting customers’ passport or ID details into registration forms. Today, when spyware programs and Trojans are becoming increasingly sophisticated, a scenario where during data transfer, ID images could be stolen by a malicious code introduced at some stage of the process, seems rather feasible. Those stolen images most probably will end up on the dark net.
  3. No one assesses the security of a channel through which the unencrypted data enters the server where image processing then takes place. In case it is secure, the process will most likely be complex, thus costly and slow.
  4. Normally, remote recognition servers work much slower than on-device recognition. It takes time to capture an image, transfer it to a server for processing, and return the data back to the system as soon as it is recognized. Additionally, a human intervention might be required in case the system fails.
  5. Any company that is using distant recognition services is fully dependent on not just stable Internet access but high connection speed, since the process involves transmission of large volumes of data.

For the software that works autonomously there is no need to transfer any information online. And in case of the latest generation software, there is also no need to save or store any images of documents, to exactly avoid leaks: the application works with the received encrypted set of characters, which cannot be used even in case of data interception.

The ethics 

When personal data gets leaked, the event causes significant reputation damage both for the company, and the software developers, however it is neither of the two who suffer the most, but the end customers.

Take a bank, for example. The mission of any bank is, at a minimum, to keep their clients’ money safe, and ideally, of course, increase its amount. This basic service implies that the client’s personal data, the amount of their savings and other sensitive information is kept safe. That, at the end of the day, is the subject matter of the contract between the client and the bank.

Bearing this in mind, not all of the functions of a bank should be outsourced. When it comes to supporting functions, those are likely to have little influence on the core performance of the bank. Recognition of documents and automatic input of personal data should not be the case here: it is what creates the link between a financial organization and its client. Our strong belief is that financial institutions should not jeopardize customer data by outsourcing data extraction automation. In fact, when doing so, companies are at the same time outsourcing the trust of their clients, which the latter may not even suspect. A financial institution, for its part, cannot guarantee the security of its customers’ data in case their personal information gets transferred, then stored on remote servers that fall under the jurisdiction of a different legal entity or even a state.

Finally, services should work in strict accordance with the internal procedures of the organization in terms of information security and be under its full control. And this, for obvious reasons, imposes additional costs both on the service and the company. Thus, the issue of business ethics to this day remains open. Banks, or any company really, may of course opt for secure third-party services of automatic identification, but the cost of those won’t be less than as if they would have developed and integrated the software in-house.

Improve your business with Smart Engines technologies



Green AI-powered scanner SDK of ID cards, passports, driver’s licenses, residence permits, visas, and other ids, more than 1856+ types in total. Provides eco-friendly, fast and precise scanning SDK for a smartphone, web, desktop or server, works fully autonomously. Extracts data from photos and scans, as well as in the video stream from a smartphone or web camera, is robust to capturing conditions. No data transfer — ID scanning is performed on-device and on-premise.


Automatic scanning of machine-readable zones (MRZ); all types of credit cards: embossed, indent-printed, and flat-printed; barcodes: PDF417, QR code, AZTEC, DataMatrix, and others on the fly by a smartphone’s camera. Provides high-quality MRZ, barcode, and credit card scanning in mobile applications on-device regardless of lighting conditions. Supports card scanning of 21 payment systems.



Automatic data extraction from business and legal documents: KYC/AML questionnaires, applications, tests, etc, administrative papers (accounting documents, corporate reports, business forms, and government forms — financial statements, insurance policies, etc). High-quality Green AI-powered OCR on scans and photographs taken in real conditions. Total security: only on-premise installation. Automatically scans document data in 2 seconds on a modern smartphone.


Green AI for Tomographic reconstruction and visualization. Algorithmization of the image reconstruction process directly during the X-ray tomographic scanning process. We aspire to reduce the radiation dose received during the exposure by finding the optimal termination point of scanning.


Send Request

Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.

    Send Request

    Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.