Binarization is a classic image processing problem. Oftentimes it is used to simplify the data and speed up the subsequent processing which doesn’t seem vital nowadays. However, when analyzing porous materials, binarization becomes essential because in this case, a data model doesn’t involve an intermediate state between a hollow pore and an impenetrable matrix. And, as usual, there is no “out-of-the-box” algorithm that works well in this situation. There are algorithms with configuration options and there are great neural network architectures. In order for them to work, they need to be configured/trained. What are we going to do if in our case the reference solution is extremely complicated to get? You will learn about an interesting approach to do so without ground truth and will get a glimpse into the world of computed tomography and its related fields.
In this article, we’ll be discussing the challenges that arise when analyzing the images of the internal structure of porous objects produced by an X-ray CT scan.
Porous structures are created for various purposes. For example, they are used in manufacturing – to adsorb and filter liquids and gases, in medicine – to make resorbable implants, etc. It is obvious that the parameter control of the produced structures is absolutely vital. Natural porous objects can be used in manufacturing as well. But even then an objective assessment of their characteristics is critical (when extracting oil, the parameters and the extraction method itself depend on the formation porosity).
Porous structures are a specific group of objects in the sense that their characteristics are defined not merely by their size and pore spatial distribution, but by the degree of their connectivity, by the fact whether they are open or closed pores, by the size of these pore surface areas. Research and analysis of the internal morphological structure of porous objects is a complex multi-component problem that requires researching a three-dimensional model since many targets can’t be assessed using the cross-section. How do we get the input data for analysis? That’s what we will talk about first.
A great non-destructive way to get the detailed internal structure of an object would be to scan it using computed (X-ray) tomography. An object is being X-rayed at different angles when this method is used. In Figure 1 you can see the X-ray microtomograph that was built in the Reflectometry and Low-angle Scattering Laboratory of the Federal Scientific Research Center “Crystallography and Photonics” of the Russain Academy of Sciences that is led by the outstanding scientist V. E. Asadchikov. The silvery tube on the right is the so-called vacuum chamber necessary for soft X-ray radiation to stay consistently strong on the way to the object of research. The sample itself is in the holder, mounted on the rotating stand, and there is a position-sensitive detector behind the sample.
Figure 1. The general appearance (left) and the measuring unit (right) of the microtomograph TOMAS in the Reflectometry and Low-angle Scattering Laboratory of the Federal Scientific Research Center “Crystallography and Photonics” of the Russain Academy of Sciences.
During the computed tomography procedure, the object is scanned a few times at different angles. The rotation angles are, as a rule, evenly distributed in the range of 0 to 180 degrees with a certain distance. A two-dimensional image is registered for each different angular position of the sample (see Figure 2). The pixel of the resulting image keeps the number of registered X-ray quanta.
Figure 2. That’s what an X-ray scan image of a porous structure looks like.
The combination of the two-dimensional images of the sample at different angles is used to construct the internal morphological structure of the object reviewed, and we are going to need some hard-core algorithmic magic here. Geometrically the reconstruction result is represented by a voxel grid (cubic volume elements). Since the three-dimensional laser projection systems are not generally available yet, for a human to analyze the results the two-dimensional tomographic images (the cross-sections of a reconstructed object) are constructed. We tried to show the correspondence between the real object and the cross-section of the reconstructed object in Figure 3. We’d like to point out that the pores were not visible in the projection, while they are easily distinguishable in the cross-section.
Figure 3. Tomographic reconstruction procedure is an instrument used to recover the internal structure of an object with no physical damage to it. We can review any cross-section by cutting the reconstructed object virtually.
That was the geometry part. As for the values, things get much more complicated. Normally, the reconstructed morphological structure refers to the distribution of X-ray attenuation coefficient. And as a rule, there is no clear understanding of what the attenuation coefficient is. If the probing emission spectrum changes, the coefficient value changes as well. We need to keep that in mind when reading a CT scan.
Tomographic images are not measured images, but the result of a numerical reconstruction. It is preferable to use monochromatic radiation for probing. That way each voxel (ideally) will have the value of the linear X-ray attenuation coefficient for the X-ray of the corresponding energy (wave length) and of the corresponding local volume.
But for “broadband” radiation there is no single parameter that measures its attenuation, and there aren’t enough measurements to reconstruct the full set of parameters. That’s why in this case it is not very clear what the value of voxel represents, and it’s determined not only by the X-ray spectrum composition and the characteristics of an object but by the used reconstruction algorithm as well. But we’ll talk about this issue some other time.
Right now we have to remember that due to the non-monochromaticity of the probing radiation there will be structural deformations in the reconstruction – the so-called artifacts. The artifact occurrence can happen due to various reasons. For example, false linear structures might occur due to the insufficient number of projections for the required resolution. For the ideal reconstruction, there should be approximately as many projections as there are voxels on the side of a reconstructed cross-section, i.e. thousands of them. And the projections are often made for every 1 degree, because of the “ought to be enough for anybody” principle. However, the non-chromatic radiation use leads to artifact occurrences regardless of the projection number. There is the so-called “bowl effect” – gradual darkening of an object from the center to the edges despite the object material being optically homogeneous (see figure 4). And on top of that, there is some noise in the image. We have to clarify that optical noise is not an additional technical error, but a physical law: the detector illumination is a random Poisson value because of the quantum nature of light. As a result, even when using the ideal equipment, the noise will be not only significant in its amplitude (a Poisson dispersion is proportional to the intensity), but it will be heteroskedastic as well (various pixels “produce various noises” depending on the illumination).
Figure 4. Horizontal cross-section of a porous object. The reconstruction result. The data was produced by the microtomograph TOMAS.
However, as we mentioned earlier, the reconstruction of an accurate absorption map of an object is not an end in itself when researching porous objects. We need to calculate such characteristics as porosity, effective porosity, pore surface area, etc, and the continuous introduction of the absorption coefficient can’t be used for calculation of these characteristics. First, we need to divide the voxels into two groups – the ones that contain pores and the ones that contain the matrix (impenetrable material), which basically means we need to solve the binarization problem (see Figure 5). Which right away begs the question: which method do we choose? Unfortunately, it’s nearly impossible to answer this question beforehand, without “trying it on” first. Methods that work great with some data sets, may not be applicable to other ones.
Figure 5. An example of binary presentation of a porous sample.
As an example, we’d like to take a convolutional neural network trained to solve the binarization problem for documents . This neural network won the popular Document Image Binarization Contest (DIBCO) in 2017 in two categories: the printed text category and the handwritten text category as well.
The authors emphasized not the architecture choice (it’s absolutely standard), but the right methodology for preparing annotated examples for training purposes, analysis of the existing reference data, and the understanding of what exactly the user expects to get as a result of the network performance. The metric used for training wasn’t limited to the specified ratio of black pixels to white pixels but was far more complex and differentiated errors and critical mistakes. Although the target documents are successfully binarized with the help of this approach, when the same network is used with tomographic reconstructions, the results leave something to be desired. We’ll demonstrate that using Figure 6. There is the greyscale input section of an object on the left, and the binarization result using the neural network – on the right. The original image was carefully pre-processed to meet the network requirements scale-wise (the authors emphasized the importance of this step for the proper use of their method). There was an uneven background in the documents, however, the morphological differences between the document primitive and the porous structures ended up being too significant.
Figure 6. The cross section of a porous object. The tomographic reconstruction result (left). The binarization result using the neural network (right).
One would think that they could apply the methodology outlined by the authors to train their own specialized network using the appropriate data and that it would work just fine. However, unlike the document binarization problem where a considerable set of ground truth has been accumulated up to this point (the contest is held once a year using expanded and updated data for many years now), there is no appropriate training data set in our field yet. Some datasets containing the sets of measured projections with the reconstruction results applied are becoming available, but, as we have mentioned before, the reconstructed images have deformations as a result of the measurement conditions and the reconstruction algorithm applied. It means that there is no ground truth – the ideally binarized images. Creating a full data set is an expensive venture and requires involving experts from the subject field. Today this approach seems unsustainable for us: even if we possessed the oilfield corporation-like resources, putting together this kind of data-set would be extremely time-consuming. What are we going to do then?
Technically, we are not required to know the correct answer in order to test or train an algorithm. All we need is to have the error estimate of an answer. And they are not the same things, the latter is easier. Let’s look at the most primitive binarization algorithm – a threshold algorithm, and try to assess its performance quality using a priori knowledge about our object.
Figure 7. On the left – an area of the reconstructed cross-section, on the right – a binarized image produced with the help of thresholding.
Let’s take a look at Figure 7. We could argue whether the right signal level has been chosen, where a voxel is considered to be free, or maybe not, but, without a shadow of a doubt, the statistical properties of this type of binarized image are heavily distorted. The thing is, there are isolated groups of white voxels in the image. We can’t see it from the image, but there are white voxels in the adjacent layers either. What does it mean? It means that, according to this reconstruction, there is a matrix piece in our sample that levitates freely, defying to the law of gravity. But this would not be possible, and we can state that even though we don’t have an ideal image. These binarization artifacts are the consequences of the noise presence in the original image, and we’ve warned you about it earlier.
So we found the necessary (very specific, and not sufficient on its own though) condition of the porous object binarization result correctness: the lack of “hanging rocks” in the pores. Let’s add a few clarifications. First, the connected component of the matrix voxels is not considered to be “hanging”, if it lies on the edge of a reconstructed object (the respective matrix fragment might hang on the sides of a container which are see-through in an X-ray range. Besides, if we didn’t make this clarification, the object itself would be “hanging”, which wouldn’t make much sense. Second, large components are not considered to be hanging if the distance to the closest matrix voxel is not greater than a low fixed threshold. In fact, large components can’t be caused by noise, and the presence or the lack of the separate “bridges”, adhesions (the fragment is being held by them) in the resulting image has no impact on the majority of researched parameters. In order to break its permeability, the barriers, not “the bridges”, should be used. The adhesions can be classified incorrectly since they have subvoxel thickness.
So we determined the first formal criterion for the image binarization algorithm customization when there is no ground truth. It’s the number of “hanging rocks” in the pores and it has to be as low as possible. Although the attempt to optimize the algorithm parameters based on this criterion can have unfortunate effects: if the threshold is too high, there will be no “hanging rocks”… or the image itself. It would be very convenient to know the expected number of white voxels since it would allow regularizing the task. We think it makes sense to note that the ground truth of the occupancy rates in some sections of an image can be used instead of the precise mark-up and it will produce good results . In our case, there is no information about the internal structures, but we can estimate the volume and the weight of the sample, and if we know the matrix material density, we can estimate the number of white voxels. This would be the second criterion.
Since our image has a lot of noise, the appropriate binarization method will include noise-reduction software, either explicitly or implicitly (like with the neural network methods). And we are about to face another meaningless answer: the threshold binarization result of an over smoothed image might contain no isolated connected components and have the correct middle fill, while the morphological parameters we are interested in would be far from reality.
To counteract such a development, let’s add the well-known structural similarity index measure (SSIM) to our quality criteria between the binarization result and the original image. This move will hinder major structural distortions. But we don’t want to give too much weight to SSIM, because there is some noise in the structure of the original image which we are going to get rid of.
Figure 8. Left – an area of a reconstructed cross-section, right – a binary image that was produced using the proposed approach.
So we got the “three-legged” quality criteria (connectivity, completion and structural similarity) that don’t require ground truth. It’s hard enough for neural network training, but it’s sufficient to customize a few algorithm parameters. In Figure 8 you can see the results of the window binarization algorithm performance after filtration, all of its parameters were evaluated automatically. Visually we still can’t tell for sure whether our estimate of the layout and the shape of pores was correct. But for the screening assessment, we can use an expensive technique – the field experiment. The measurement results of the sample using the mercury porosimetry method confirmed the parameters of the distribution of pore sizes. We were able to set up the image binarization algorithm parameters appropriately without the marked-up data set. This approach requires very extensive knowledge in the subject area, otherwise, the result might end up being unrealistic.
 Bezmaternykh PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization contest. Computer Optics 2019; 43(5): 826-833. DOI: 10.18287/2412-6179-2019-43-5-826-833.
 Д.П. Николаев, А.А. Сараев. Критерии оценки качества в задаче автоматизированной настройки алгоритмов бинаризации. Труды ИСА РАН 2013; 63(3): 85-94.
Green AI-powered scanner SDK of ID cards, passports, driver’s licenses, residence permits, visas, and other ids, more than 1810+ types in total. Provides eco-friendly, fast and precise scanning SDK for a smartphone, web, desktop or server, works fully autonomously. Extracts data from photos and scans, as well as in the video stream from a smartphone or web camera, is robust to capturing conditions. No data transfer — ID scanning is performed on-device and on-premise.
Automatic scanning of machine-readable zones (MRZ); all types of credit cards: embossed, indent-printed, and flat-printed; barcodes: PDF417, QR code, AZTEC, DataMatrix, and others on the fly by a smartphone’s camera. Provides high-quality MRZ, barcode, and credit card scanning in mobile applications on-device regardless of lighting conditions. Supports card scanning of 21 payment systems.
Automatic data extraction from business and legal documents: KYC/AML questionnaires, applications, tests, etc, administrative papers (accounting documents, corporate reports, business forms, and government forms - financial statements, insurance policies, etc). High-quality Green AI-powered OCR on scans and photographs taken in real conditions. Total security: only on-premise installation. Automatically scans document data in 2 seconds on a modern smartphone.
Green AI for Tomographic reconstruction and visualization. Algorithmization of the image reconstruction process directly during the X-ray tomographic scanning process. We aspire to reduce the radiation dose received during the exposure by finding the optimal termination point of scanning.
Please fill out the form to get more information about the products,
pricing and trial SDK for Android, iOS, Linux, Windows.