Compressed Video-Based Classification for Efficient Video Analytics

Abstract

Videos have become a crucial part of human life nowadays and share a large proportion of internet traffic. Various video-based platforms govern the mass consumption of videos through analytics-based filtering and recommendations. Various video-based platforms govern their mass consumption by analytics-based filtering and recommendations. Video analytics is used to provide the most relevant responses to our searches, block inappropriate content, and disseminate videos to the relevant community. Traditionally, for video content-based analytics, a video is first decoded to a large raw format on the server and then fed to an analytics engine for metadata generation. These metadata are then stored and used for analytic purposes. This requires the analytics server to perform both decoding and analytics computation. Hence, analytics will be fast and efficient, if performed over the compressed format of the videos as it reduces the decoding stress over the analytics server. This field of object and action detection from compressed formats is still emerging and needs further exploration for its applications in various practical domains. Deep learning has already emerged as a de facto for compression, classification, detection, and analytics. The proposed model comprises a lightweight deep learning-based video compression-cumclassification architecture, which classifies the objects from the compressed videos into 39 classes with an average accuracy of 0.67. The compression architecture comprises three sub-networks i.e. frame and flows autoencoders with motion extension network to reproduce the compressed frames. These compressed frames are then fed to the classification network. As the whole network is designed incrementally, the separate results of the compression network are also presented to illustrate the visual performance of the network as the classification results are directly dependent on the quality of frames reconstructed by the compression network. This model presents a potential network and results can be improved by the addition of various optimization networks.

Keywords: Compression, CNN, ConvGRU, Deep learning, PSNR, SSIM, Video.

Cite as