Stay informed about technical articles and developments by subscribing to our newsletter.
With ARMs Cortex-A76 and Cortex-A55 becoming more and more available in the automotive industry, I was curious how the efficiency of those CPU micro-architectures improved compared to their predecessors Cortex-A72 and Cortex-A53 when executing sensor fusion code. As you might know, Qualcomm is also using derivatives of those designs in some of their Snapdragon processors.
BASELABS develops software for sensor fusion for automated driving. We are constantly trying to gauge the runtime performance of our algorithms on embedded devices. This is achieved utilizing our CI infrastructure. We recently added Rockchip RK3588-based boards to this. The RK3588 integrates a quad-core Cortex-A76 and a quad-core Cortex-A55 in one SoC, referred to as the “big.LITTLE” architecture. Our CI infrastructure also contains other embedded targets using the predecessor cores Cortex-A53 and Cortex-A72, used in the Rockchip RK3399, which also uses the “big.LITTLE” architecture. Adding these embedded devices to our CI infrastructure allows us to track and compare the runtimes on both generations of CPU designs.
In the following, I will show you the relative performance of two algorithms on these CPUs. The first is a classical object fusion – implemented with BASELABS Create Embedded – and the second is an advanced low-level grid fusion – the BASELABS Dynamic Grid. For each sensor fusion algorithm we compared the runtimes on the A72 to the ones on the A53, the A55, and the A76. We also normalized the runtimes to the clocks of the CPU cores to control for the differences in the clock frequencies. In addition to the relative performance of the object and grid fusion, I listed the expected relative performance based on the available DMIPS numbers: 2.3 DMIPS/MHz for the A53, 3 DMIPS/MHz for the A55 and 7.4 DMIPS/MHz for the A72. I could not find solid numbers for the A76, so I used 12 DMIPS/MHz which I saw in one presentation.
The figure below shows the result of the comparison. The numbers indicate that both the Cortex-A55 and Cortex-A76 are very efficient designs with respect to executing code for Dynamic Grid Fusion, compared to the Cortex-A72. For the object fusion, the measured runtimes are closer to what one would expect considering the DMIPS-based performance indication.
For these experiments we configured the object fusion based on BASELABS Create Embedded for five radars, one camera, and at most 60 concurrently tracked objects. The BASELABS Dynamic Grid was configured for four corner radars and used 400x400 cells with 25 cm resolution, thus covering 100x100 meters. Even though the Cortex-A72 and Cortex-A76 can be more performant, the Cortex-A55 is already powerful enough to process the mentioned example in real-time. Its bigger brother, the Cortex-A76, offers even more computing power, enabling the fusion of additional sensors, like cameras or lidars. For example, a Dynamic Grid based sensor fusion of four cameras and four radars can run on two A72 cores in real-time.