Advancing Human Activity Recognition in Ambient Assisted Living through Multi-View Robotics: the RHM Dataset and Dual-Stream C3D Model

Bamorovat Abadi, Mohammad Hossein (2025) Advancing Human Activity Recognition in Ambient Assisted Living through Multi-View Robotics: the RHM Dataset and Dual-Stream C3D Model. Doctoral thesis, University of Hertfordshire.
Copy

This thesis investigates the intersection of Human Action Recognition (HAR) and Human-Robot Interaction (HRI), in Ambient Assistive Living (AAL) environments. The primary contribution of our research is the development of the Robot House Multi-View (RHM) dataset, featuring 26,804 RGB trimmed videos from four distinct views classified into 14 action classes: a dynamic robot view, static top view, and static front and back views. Dataset: The RHM dataset addresses significant gaps in existing HAR datasets, particularly within the HRI domain. To validate the dataset, a comprehensive approach using Deep learning (DL) and Mutual Information (MI) was employed. The dynamic robot view presents unique challenges due to lower accuracy in comparison to static views, attributed to its inherent variability and motion. A novel MI metric was introduced to analyse temporal dependencies and information redundancy across video frames. State-of-the-art DL models, including C3D, R(2+1)D, R3D, and SlowFast, were tested on the RHM dataset. Methodology: The thesis introduces a novel multi-stream model, the Dual-stream C3D, which integrates multiple views to enhance HAR accuracy. The combination of Front and Robot views in this model shows the highest accuracy, highlighting the potential of multi-view integration for improving action recognition performance. Specifically, the model demonstrated a 10% increase in Top-1 accuracy for the robot view when combined with other views, such as the front view. However, despite these improvements, consistent confusion patterns among certain action classes persist, suggesting the need for further refinement in feature extraction in recognition models. Feature Extraction Techniques: Additionally, the research introduces and evaluates three novel feature extraction techniques: Motion Aggregation (MAg), Differential Motion Trajectory (DMT), and Frame Variation Mapper (FVM). These techniques target different temporal aspects of video frames and are shown to significantly enhance the performance of HAR models. Experimental results indicate that the combination of Normal frames in the first stream and DMT in the second stream achieves the highest accuracy, particularly for the Front-Robot viewpoint pair. These findings underscore the adaptability and effectiveness of these feature extraction methods across various models and viewpoints. Conclusion: In summary, this thesis presents the RHM dataset as a substantial contribution to HAR and HRI, offering innovative methodologies and insights that significantly improve action recognition accuracy in AAL scenarios. The integration of multi-view data, novel deep learning models, and advanced feature extraction techniques collectively advance the state-of-the-art in HAR within the context of assistive robotics.


picture_as_pdf
17044754 BAMOROVAT ABADI Mohammad Final Version of PhD Submission.pdf
Available under Creative Commons: BY 4.0

View Download

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads