Embedded vision: look and identify
25 August 2017
Figure 1: The ADAS demo system shown at both Embedded Vision and DAC 2017
Embedded vision is proving more and more popular across a range of industry sectors, and is increasingly requiring FPGA-based hardware acceleration to alleviate processor workload. This article outlines the challenges and solutions involved, and explains how an ADAS demonstration unit works.
This article originally appeared in the September 2017 issue of Electronic Product Design & Test; to view the digital edition, click here – and to register to receive your own printed copy, click here.
Having given rise to several dedicated trade events, plus the formation of a bespoke alliance, ‘embedded vision’ is certainly an industry hot topic. The term refers to the integration of machine vision into a system so that it can automatically recognise certain aspects of its surrounding environment – and take action if required. Industry sectors looking to employ embedded vision technology include aerospace, automotive, industrial, medical and security.
Video cameras are, of course, the system’s ‘eyes’, and in this respect, the higher the resolution and the faster the frame rate, the greater the volume of data that must be processed. Dubbed ‘Big Data’, processing it is proving challenging, particularly in the automotive sector. Here, a variety of Advanced Driver Assistance System (ADAS) solutions are under development. They offer the potential to make driving safer, easier and more comfortable, and ADAS is regarded as a significant step towards fully autonomous vehicles.
A typical ADAS will include multiple video cameras – as well as radio and light detection and ranging (RADAR and LIDAR, respectively) – in order to capture and process data from the vehicle’s environment. The processed data can be used to notify the driver of problems or to automatically trigger responses such as deceleration, braking and/or the execution of a manoeuvre. ADAS functions that are reliant on cameras (but not necessarily wholly so) include:
* Lane Departure Warning (LDW) – for which a line detection algorithm will inform the driver about any unintentional road lane departure;
* Pedestrian Detector (PD) – for which an object detection algorithm is configured to detect pedestrians in front of a vehicle;
* Forward Collision Warning (FCW) – for which another object detection algorithm is configured to detect multiple vehicles in front of the driver's path; and
* Traffic Sign Recognition (TSR) – again, based on object detection, with classification algorithms able to detect and recognise traffic signs from the vehicle environment.
The processing of Big Data sequentially – in other words, a processor running software algorithms – continues to prove challenging for those developing ADAS. It is hard to physically process data at many of the desired frame rates. Accordingly, Field Programmable Gate Arrays (FPGAs) are now being used in many embedded vision applications to, quite literally, take the heat off the processor.
FPGAs can perform several different tasks in parallel. For example, consider executing many non-dependent computations (such as A=B+C, D=E+F and G=H+I). Typically, these would have to be completed sequentially, with each sum requiring a few clock cycles. In firmware, an array of adders could do the computations in parallel, possibly requiring only a single clock cycle. Another important feature of FPGAs is, of course, their reprogrammability.
Get to the demo!
Figure 2: Getting the image from a single camera into a VIP sub-system, implemented in programmable logic
To best illustrate some of the numbers involved in an embedded vision system, it is worth considering an ADAS demo unit (see Figure 1) that was built for – and received considerable interest at – 2017’s Embedded Vision show in Santa Clara, California, as well as the Design Automation Conference in Austin, Texas.
The demo comprised a TySOM-2-7Z100 prototyping board, which included a Xilinx Zynq XC7Z100 device, and an ADAS FMC daughter board to interface with multiple cameras. It was built to show how multiple-camera, high-speed processing can be achieved: by sharing the workload between the dual-core ARM Cortex-A9 processor and the FPGA logic residing within the Zynq device.
Four cameras were employed for the purposes of processing a front view, a rear view and two side views (left and right). They were Blue Eagle cameras from First Sensor, with 190-degree wide-angle lenses and LVDS data communication interfaces. The camera links were implemented using four FPD-Link III-compatible DS90UB914Q deserialiser chips (one per camera) on the ADAS daughter card.
To accept the deserialised data and video input port (VIP), subsystems were implemented in programmable logic (PL) cells within the Zynq-7000 device. These subsystems were used to store image pixel data into frame buffers, which are allocated in main system DDR memory and are fully compatible with the V4L2 (Video 4 Linux 2) Linux framework. The block-level view of a single camera link is shown in Figure 2.
In terms of image processing, the following steps were implemented:
1. input frames were grabbed from all four cameras into internal input buffers, using a camera V4L2 driver interface;
2. an edge detection algorithm (Sobel 3 x 3 kernel) was applied to each camera frame;
3. detected edges were superimposed over the initial camera images (using different highlight colours for each camera);
4. the colour space of the processed images was converted from native 16bit YUV 4:2:2 into 32bit RGBA 8:8:8:8, to meet the requirements of an output HDMI subsystem;
Figure 3: The four views brought together, with edge detection applied
5. all four camera frames were merged into a full HD HDMI buffer (1928 x 1080 pixels); and
6. the HD video buffer was output to screen (see Figure 3).
Of the above stages, 2 to 5 are computationally-intensive, due to the pixel-level computations being applied to a large volume of pixels (4 x 960 x 540 = about 2 million pixels); as such, they require several memory access operations. To perform these stages on the ARM CPU alone, a frame rate of only 3 per second could be realised.
To eliminate the memory access bottleneck, the main computational parts of stages 2 to 5 were moved to the programmable logic side of the Zynq-7000, using a Xilinx SDSoC tool: a function marked for HW acceleration, proper data movement and memory allocation pragmas. Doing so resulted in two clear benefits:
* Single-pixel data memory accesses are replaced by multiple burst DMA transactions – in other words, the CPU is not involved in data motion operations, and is only responsible for DMA engine control;
* All pixel computations are made by the same hardware accelerator core, implemented in the programmable part of the Zynq-7000 in a pixel-by-pixel streaming manner, thus reducing the amount of main FPGA resources (LUTs, FFs, BRAMs) needed to convert and implement user C/C++ code.
As a result, the overall performance was increased from 3 to about 27.5 frames per second.
The ADAS demo unit, built around the commercially-available TySOM-2-7Z100 prototyping board, demonstrated perfectly how offloading certain computationally-intensive tasks into programmable logic can deliver the kind of processing speeds demanded by the automotive sector.
Contact Details and Archive...