Applications for Video Action Recognition and Prediction Special Track


Chairs:

Nino Cauli – University of Catania, Italy
Mirko Rakovic – University of Novi Sad, Serbia
Diego Reforgiato Recupero - University of Cagliari, Italy



Intelligent systems, able to monitor and to interact with their surroundings, are experiencing a rapid growth in the modern society. Among the various sensors available, RGB and RGB-D cameras posses the better trade-off between costs and amount of information provided. Through the use of cameras, intelligent systems are able to recognise and predict the actions of the humans around them. The ability to understand and to predict observed actions is a fundamental skill in social interaction. Body movements carry much information about the intentions and identity of the observed actors and vision is the main sensory system used by humans to recognize and to predict actions. Recognising and predicting human actions from videos are fundamental abilities in several cutting edge applications: self driving cars must be able to predict the pedestrians behaviour; video surveillance systems must detect criminal actions; collaborative and humanoid robots need to detect human's motion in shared environment; medical monitoring systems needs to check the proper execution of exercises performed by the patients; full body game controllers for virtual reality need to recognise the actions of the users.
The field of video action recognition and prediction encompass a broad set of sub-problems. In action prediction, instead of having the video of the entire action available, only an initial portion of the performed action is given as input, making the task more challenging compared to recognition.  One step further is the prediction of the expected sensory input generated by the observed action in the form of a video sequence. Another important recent research direction is video self action recognition and prediction from wearable cameras. In this case first-person videos are used to recognise the action performed by the user. This type of application has high impact in health-care and sport monitoring systems. Last but not least is the problem of scaling the existing algorithm to distributed camera systems.
Early algorithms used handcrafted features to recognise and predict actions from videos, but in the last decade, we are facing a shift to Deep Learning (DL) architectures. Researchers are actively investigating on new DL models specifically crafted to recognise actions from video sequences where the time domain plays an important role. While already exist several DL models able to extract features from single images, less of them take into account the temporal information embedded in video sequences. Recurrent Neural Networks (RNN), state of the art for speech and language processing, are starting to be used in conjunction with DL models for video processing with promising results.
DL for video action recognition and prediction is a highly relevant research topic. Several video datasets are already available, recorded in both constrained and unconstrained conditions and with RGB or RGB-D cameras. In the past years, challenges on video activities recognition are hosted in international visual processing conferences like CVPR and ECCV.
The AVARP special session will be an occasion to expose the Distributed Computing community to the problem of video action recognition and prediction applied in Intelligent Systems, promoting the exchange of new ideas and stimulating collaborations. The goal of this special session is to gather researchers working on different areas of video action recognition and prediction in order to stimulate discussions. A special emphasis will be placed on analysing the interaction between DL and video action recognition and prediction.
The AVARP special session focuses on the following topics:
  • Deep Learning for video action recognition and prediction
  • Distributed video action recognition for surveillance systems
  • Action based expected visual sensory prediction
  • Video action trajectory prediction
  • Egocentric action recognition and prediction
  • Long term action prediction
  • Deep Recurrent Neural Network for video action prediction
  • Introduction of new datasets, benchmarks and challenges for video action recognition and prediction


Important Dates

July 25th, 2021 (extended) Paper submission
August 18th, 2021 Notification of acceptance
August 31st, 2021 Final paper submission
September 16th-18th, 2021 Symposium dates



Submission of Papers

All accepted papers will be included in the Symposium Proceedings, which will be published by Springer.

Full papers must be at most 10 pages long, short papers must be at most 6 pages long and poster must be at most 3 pages long and all them must be formatted according to Springer format.

Submissions and reviews are automatically handled by EasyChair. Please submit your paper at:

Please, during the submission process specify this Special Track as topic AVARP - Applications for Video Action Recognition and Prediction in easychair.



TPC Members
  • Alexandre Bernardino, ISR, Instituto Superior Técnico, Lisbon, Portugal
  • Kosta Jovanovic, School of Electrical Engineering (ETF), Belgrade, Serbia
  • Lorenzo Jamone, Queen Mary University of London, United Kingdom
  • Egidio Falotico, Scuola Superiore Sant’Anna, Pisa, Italy
  • Giovanni Maria Farinella, University of Catania, Italy
  • Sebastiano Battiato, University of Catania, Italy
  • Rubén Alonso, R2M Solution, Italy
  • Daniele Riboni, University of Cagliari, Italy
  • Silvio Barra, University of Cagliari, Italy