Optical flow is a computer vision operation that seeks to calculate the apparent motion of features across two consecutive frames of a video sequence. It is an important constituent kernel in many automated intelligence, surveillance, and reconnaissance applications. Different optical flow algorithms represent points in the trade off space of accuracy and cost, but in general all are extremely computationally expensive. In this paper we describe an implementation and tuning of the dense pyramidal Lucas-Kanade Optical Flow method on the Texas Instruments C66x, a 10 Watt embeddeddigital signal processor (DSP).
By using aggressive manual optimization, we achieve 90% of its peak theoretical floating point throughput, resulting in an energy efficiency that is 8.2X that of a modern Intel CPU and 2.0X that of a modern NVIDIA GPU. We believe this is a major step toward the ability to deploy mobile systems that are capable of complex computer vision applications, and real-time optical flow in particular.