13.08.2020 г.

Bipolar morphological networks: neuron without multiplication

It is quite hard to find a problem to be solved by any other way than a neural network. Moreover, other methods are not even considered in many cases. Logically enough during a ‘silver bullet’ hunt researchers and technologists propose newer and better neural-network architecture modifications which are supposed to raise every application engineer to cloud nine! However,  in terms of industrial problems, the preciseness of a model often happens to be dependent on the fineness, size and structure of a training set, and the neural network model is required to provide a proper interface (for example, it doesn’t feel right if the neural network output has to be a list of a variable length).

Output efficiency and responsiveness are a different story. The dependence from architecture here is direct and quite predictable. Yet not all researchers are interested. It is nicer to think centuries, epochs and mentally be focused on a future when calculating capacities will be unparalleled and energy will be produced out of thin air.

However, there are enough down-to-earth people, too. It is significant for them to have neural networks compact, faster and more energy-efficient than they are nowadays. For example, it is important for mobile devices or embedded systems which do not have a powerful graphics card or battery capacity must be saved.   A lot has been done in this field: low-bit integer neural networks, elimination of extra neurons, tensor decomposition of the convolution operations,  etc.

We managed to eliminate multiplications out of computations within a neuron by replacing them with addition and taking the maximum (though multiplications and non-linear operations are retained within activation functions). We named our model bipolar morphological neuron model.

It is interesting enough that both in research papers on high-speed inference and outright alchemic papers about ‘more better’ architectures the basic neuron model of McCulloch-Pitts is hardly ever revised. It is that model in accordance with which neuron response is defined by a weighted sum of inputs and synaptic weight computing comes as the training process. On the one hand, there are no optimization reserves to be detected.  On the other hand, a total neuron mathematics replacement can result in a model which all state-of-the-art training approaches might become unsuitable for, thus everything will have to be built from scratch.

With a deeper insight central, graphical and even specialized tensor processors, which all classic networks are inferred, waste a lot of resources on multiplications. Along with that in terms of logic gate implementation multiplications are slower and more complicated than additions and that is why if multiplication could be replaced with less resource-consuming operations, tensor processors could be optimized. The game is worth the candle. And technological complications give us no fear. Labor omnia vīcit improbus et dūrīs urgēns in rēbus egestās.

All the more we were not the first ones. In the 90s a morphological neuron was introduced [1, 2]. This model mostly appeals to biological characteristics of neurons. A further development of this idea were dendritic morphological neurons, which allowed to model excitation and inhibition processes separately, and a generalization of the model in the terms of lattice algebra [4]. As a rule, morphological neural network appears to be a one-layer perceptron. To train such a neural network, heuristic algorithms [5] are used, which can be supplemented by stochastic gradient descent [6]. However, such networks are badly scalable for deep learning problems and do not allow achieving high quality in most current tasks. So at this time they mostly appeal to academic interest.

We, however, wanted to achieve decent quality in classification and recognition tasks so we proposed our own model.  Not only does it use morphological operations but approximates a classical neuron which potentially allows adapting modern neural network architectures to the model.

Bipolar morphological neuron

Classical neuron performs the following operation:

Bipolar morphological networks: neuron without multiplication

In other words, it computes a linear combination of input signals x with weights w, adds a bias term and then uses non-linear activation function σ.

We proposed a bipolar morphological neuron (BM-neuron), which approximates a classical one without internal multiplications.  We will demonstrate the way we got it. To begin with, the sum of products in a classical neuron can be expressed as a sum of 4 terms which differ by the sign of weights and inputs:

Bipolar morphological networks: neuron without multiplication

where

Bipolar morphological networks: neuron without multiplication

Let us consider each term separately. Let us denote:

Bipolar morphological networks: neuron without multiplication

and apply the following approximation:

Bipolar morphological networks: neuron without multiplication

where y— new inputs, — new weights.

Obviously, the approximation is correct when k≪1. Since 0≤k≤N−1, the best case occurs if the sum contains only one non-zero term (k=0), and the worst one — in case of the sum of equal terms (k=N−1). In the worst case, the real value for the sum will be N times more than the approximated. This can not be called a good approximation, however, the network can make it up due to non-linearity between the layers — we are unable to state outright if this approximation ruins things or not. For example, neural networks with low-bit integer-valued coefficients can not be obtained by direct conversion from the classical ones, however, by using particular methods it is possible to get quality comparable to the original one. Therefore, we considered BM-neuron to be of high potential.

BM-neuron scheme is presented in Figure 1. With the help of ReLU we create 4 computational branches: for positive and negative inputs and for positive and negative weights. Next, we calculate logarithms of inputs and a morphological operation itself is implemented within the neuron. Finally, the results are exponentiated and deducted for getting a neuron output.

Logarithm calculation operation and exponents are performed over the set of layer inputs and outputs that is why they can be considered to be a part of an activation function and can be executed once per layer and not for each neuron.  As a rule, activation function contribution into the general computational network is rather small, therefore, some complication of its structure is unlikely to give problems. Later, to accelerate the process it is possible to replace these operations with their approximations (e.g. piecewise-linear) and for the networks with quantized coefficients — with look-up tables.

Bipolar morphological networks: neuron without multiplication

Fig. 1. Bipolar morphological neuron scheme

 

Overall, the BM-neuron computes the following value:

Bipolar morphological networks: neuron without multiplication

where

Bipolar morphological networks: neuron without multiplication

By the way, we named our neuron as bipolar morphological on purpose. The word ‘bipolar’ relates to the pairs of computational branches for positive and negative signals imitating excitation and inhibition. Such neurons in biology are typically responsible for perception, for example, neurons of a retina.

Training

Fairly enough it would be perfect to create a converter which uses a pre-trained network, converts it and voila: fast and precise BM-network is ready! However, to do it that way was impossible and we slightly loosened the requirements for the training method. We converted the network layer by layer and fine-tuned the remaining classical layers (method 1) or the whole network (method 2). Why bother?  While converting one layer the quality does not fall to zero and the network does not have to be trained from scratch on a large amount of data. It can be extremely useful if the classical network itself requires several weeks for training.

It is not quite obvious how the BM-network must be trained: due to the fact that neurons compute maximum only a single weight in the BM-layer can be updated on each step and some weights can fail to contribute to the final result at all if they were badly initialized. We could not simply get the BM-network trained using standard methods based on gradient descent: the quality was acceptable but lower than it was using other methods which we propose below.

Globally, the approach which we used is based on incremental learning — gradually we adapt the model to new data or a new structure but it does not lose the given information.  We go ahead layer by layer and convert a current layer into the BM-type. BM-layer weights are initialized in such a way that it approximates the original layer. After that either the ‘tail’ is fine-tuned (method 1) or the whole network is fine-tuned (method 2). The word ‘fine-tuning’ here means that the network has already been initialized with decent values and does not need significant changes of the coefficients.  Thus, we escape the necessity to select an initialization pattern for the network or a separate BM-layer, moreover, we have a hope that the network will get trained by gradient descent despite the difficulties of this procedure with regards to the BM-neuron.

Experiments

MNIST

MNIST is an open training dataset which consists of 60000 greyscale images of handwritten digits of size 28 by 28. 10000 more images comprise a test set. We have used 10% for validation, the rest of them were for training. Some image examples are given in Figure 2.

Bipolar morphological networks: neuron without multiplication

Fig. 2. MNIST characters examples

Let’s introduce the following notation for the network layers:

conv(n, w_x, w_y) — convolutional layer with n filters of w_x by w_y size;

fc(n) — fully connected layer with n neurons;

maxpool(w_x, w_y) — max-pooling layer with a w_x by w_y window;

dropout(p) — dropout layer with a connection crop probability p;

relu — activation function ReLU(x)=max(x,0);

softmax- softmax activation function.

We trained two simple convolutional neural networks for classification on MNIST:

CNN1: conv1(30, 5, 5) — relu1 — dropout1(0,2) — fc1(10) — softmax1.

CNN2: conv1(40, 5, 5) — relu1 — maxpool1(2, 2) — conv2(40, 5, 5) — relu2 — fc1(200) — relu3 — dropout1(0,3) — fc2(10) — softmax1.

Then, we started converting their layers into bipolar morphological ones. The results are presented in Table 1. We must note that the dash in the “Converted” column corresponds to the original non-converted network. We show the quality right after conversion and after the conversion (C) and the fine-tuning (F) as well.

Table 1. Neural network recognition quality depending on conversions to bipolar morphological layers on a subset of MNIST.

C — quality after conversion, F — quality after fine-tuning.

Network Converted Method 1, C Method 1, C+F Method 2, C Method 2, C+F
CNN1 98,72 98,72
CNN1 conv1 42,47 98,51 38,38 98,76
CNN1 conv1 — relu1 — dropout1 — fc1 26,89 19,86 94,00
CNN2 99,45 99,45
CNN2 conv1 94,90 99,41 96,57 99,42
CNN2 conv1 — relu1 — maxpool1 — conv2 21,25 98,68 36,23 99,37
CNN2 conv1 — relu1 — maxpool1 — conv2 — relu2 — fc1 10,01 74,95 17,25 99,04
CNN2 conv1 — relu1 — maxpool1 — conv2 — relu2 — fc1 — dropout1 — relu3 — fc2 12,91 48,73 97,86

 

When we didn’t attempt to train the BM-layers, the quality was already going down with convolutional layers, so at the fully-connected layers, the network virtually stopped doing any recognition. Moreover, the result for the network with two convolutional BM-layers isn’t better than the result for the network consisting of the remaining fully-connected layers. That means that the approximation quality is subpar and network conversion without training is not going to work.

When it comes to converted layers fine-tuning, we have a far better outcome: the convolutions converted virtually unscathed. In our opinion, when it comes to the fully-connected layers, the quality decline can be explained by an imperfect training method but doesn’t mean that they don’t work in general. However, even the current result is good enough: most of the inference time of the neural network is at the account of the convolutions, so primarily, these are what we want to speed up.

MRZ reading: characters

We chose to use our own set of characters from the MRZ line of identity documents (see Fig. 3) based on our MRZ reading experience. We used about 280 000 greyscale images with size 21 by 17 pixels, corresponding to the 37 MRZ characters from the images of real documents.

Bipolar morphological networks: neuron without multiplication

Figure 3. Examples of MRZ zone characters

CNN3: conv1(8, 3, 3) — relu1 — conv2(30, 5, 5) — relu2 — conv3(30, 5, 5) — relu3 — dropout1(0,25) — fc1(37) — softmax1.

CNN4: conv1(8, 3, 3) — relu1 — conv2(8, 5, 5) — relu2 — conv3(8, 3, 3) — relu3 — dropout1(0,25) — conv4(12, 5, 5) — relu4 — conv5(12, 3, 3) — relu5 — conv6(12, 1, 1) — relu6 — fc1(37) — softmax1.

The conversion results are presented in Table 2. The dash in the “converted” column corresponds to the original non-converted network. We demonstrate the quality right after conversion and after the conversion (C) and fine-tuning (F), as before.

And we see the same situation as when working with MNIST: if the BM-layers are not trained, the quality goes down, especially on the fully-connected layers. When we train the BM-layer, the convolutions are converted almost perfectly while the fully-connected layers still get somewhat compromised.

Table 2. Neural network recognition quality depending on the conversion to bipolar morphological layer on a subset of MRZ characters.

C — quality after conversion, F — quality after fine-tuning.

 

Network Converted Method 1, C Method 1, C+F Method 2, C Method 2, C+F
CNN3 99,63 99,63
CNN3 conv1 97,76 99,64 83,07 99,62
CNN3 conv1 — relu1 — conv2 8,59 99,47 21,12 99,58
CNN3 conv1 — relu1 — conv2 — relu2 — conv3 3,67 98,79 36,89 99,57
CNN3 conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — fc1 12,58 27,84 93,38
CNN4 99,67 99,67
CNN4 conv1 91,20 99,66 93,71 99,67
CNN4 conv1 — relu1 — conv2 6,14 99,52 73,79 99,66
CNN4 conv1 — relu1 — conv2 — relu2 — conv3 23,58 99,42 70,25 99,66
CNN4 conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 29,56 99,04 77,92 99,63
CNN4 conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 — relu4 — conv5 34,18 98,45 17,08 99,64
CNN4 conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 — relu4 — conv5 — relu5 — conv6 5,83 98,00 90,46 99,61
CNN4 conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 — relu4 — conv5 — relu5 — conv6 -relu6 — fc1 4,70 27,57 95,46

 

Conclusion

We suggested the model of a bipolar morphological neuron which approximates a conventional neuron. We also explained how to convert a conventional network into the BM-structure and how to fine-tune it. Our approach demonstrated the convolutional layer conversion that went practically unscathed when tested on MNIST and a selection of MRZ characters.

Was it worth the trouble? On the one hand, modern CPUs and GPUs require about the same amount of time for multiplication and addition operations. On the other hand, a similar model on special-purpose devices when appropriately modified (for example, with quantization and the use of logarithm and exponent optimization) is able to really give it an edge. Considering that technological advancement moves towards creating separate devices for neural network inference such as neural processors and specialized schemes like TPU, this idea becomes even more attractive.

There is still a lot of work to be done, but we made the first step in the right direction and we will continue researching our model: its application for deep networks, possibility of quantization, and more sophisticated training methods.

  1. This article is based on the findings presented at ICMV 2019:
  2. Limonova, D. Matveev, D. Nikolaev and V. V. Arlazarov, “Bipolar morphological neural networks: convolution without multiplication,” ICMV 2019, 11433 ed., Wolfgang Osten, Dmitry Nikolaev, Jianhong Zhou, Ed., SPIE, Jan. 2020, vol. 11433, ISSN 0277-786X, ISBN 978-15-10636-43-9, vol. 11433, 11433 3J, pp. 1-8, 2020, DOI: 10.1117/12.2559299.

References

  1. X. Ritter and P. Sussner, “An introduction to morphological neural networks,” Proceedings of 13th International Conference on Pattern Recognition 4, 709–717 vol.4 (1996).
  2. Sussner and E. L. Esmi, Constructive Morphological Neural Networks: Some Theoretical Aspects and Experimental Results in Classification, 123–144, Springer Berlin Heidelberg, Berlin, Heidelberg (2009).
  3. X. Ritter, L. Iancu, and G. Urcid, “Morphological perceptrons with dendritic structure,” in The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ ’03., 2, 1296–1301 vol.2 (May 2003).
  4. X. Ritter and G. Urcid, “Lattice algebra approach to single-neuron computation,” IEEE Transactions on Neural Networks 14, 282–295 (March 2003).
  5. Sossa and E. Guevara, “Efficient training for dendrite morphological neural networks,” Neurocomputing 131, 132–142 (05 2014).
  6. Zamora and H. Sossa, “Dendrite morphological neurons trained by stochastic gradient descent,” in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 1–8 (Dec 2016)

Improve your business with Smart Engines technologies

Send Request

Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.

    Send Request

    Please fill out the form to get more information about the products,pricing and trial SDK for Android, iOS, Linux, Windows.