Proc. SPIE 11041, Eleventh International Conference on Machine Vision (ICMV 2018), 110411N DOI: 10.1117/12.2523101
In the paper we consider computational optimization of recognition system on Very Long Instruction Word architecture. Such architecture is aimed to a broad parallel execution and low energy consumption. We discuss VLIW features on the example of Elbrus-based computational platform. In the paper we consider system for 2D art recognition as the example. This system is able to identify a painting on acquired image as a painting from the database, using local image features constructed from YACIPE-keypoints and their RFD-based binary color descriptors, created as a concatenation of RFD-like descriptors for each channel. They are computed fast, while the 2D art database is quite large, so in our case more than a half of execution time consumes descriptor comparison using Hamming distance during image matching. This operation can be optimized with the help of low-level optimization considering special architecture features. In the paper we show efficient usage of intrinsic functions for Elbrus-4C processor and memory access with array prefetch buffer, which is specific for Elbrus platform. We demonstrate the speedup up to 11.5 times for large arrays and about 1.5 times overall speedup for the system without any changes in intermediate computations.