Human action recognition using attention based LSTM network with dilated CNN features

Muhammad, Khan; Mustaqeem; Ullah, Amin; Imran, Ali Shariq; Sajjad, Muhammad; Kıran, Mustafa Servet; de Albuquerque, Victor Hugo C.

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.13091/1008

Title:	Human action recognition using attention based LSTM network with dilated CNN features
Authors:	Muhammad, Khan Mustaqeem Ullah, Amin Imran, Ali Shariq Sajjad, Muhammad Kıran, Mustafa Servet de Albuquerque, Victor Hugo C.
Keywords:	Artificial Intelligence Action Recognition Attention Mechanism Big Data Dilated Convolutional Neural Network Deep Bi-Directional Lstm Multimedia Data Security Big Data Framework Security Internet Machine Fusion System Things
Publisher:	ELSEVIER
Abstract:	Human action recognition in videos is an active area of research in computer vision and pattern recognition. Nowadays, artificial intelligence (AI) based systems are needed for human-behavior assessment and security purposes. The existing action recognition techniques are mainly using pre-trained weights of different AI architectures for the visual representation of video frames in the training stage, which affect the features' discrepancy determination, such as the distinction between the visual and temporal signs. To address this issue, we propose a bi-directional long short-term memory (BiLSTM) based attention mechanism with a dilated convolutional neural network (DCNN) that selectively focuses on effective features in the input frame to recognize the different human actions in the videos. In this diverse network, we use the DCNN layers to extract the salient discriminative features by using the residual blocks to upgrade the features that keep more information than a shallow layer. Furthermore, we feed these features into a BiLSTM to learn the long-term dependencies, which is followed by the attention mechanism to boost the performance and extract the additional high-level selective action related patterns and cues. We further use the center loss with Softmax to improve the loss function that achieves a higher performance in the video-based action classification. The proposed system is evaluated on three benchmarks, i.e., UCF11, UCF sports, and J-HMDB datasets for which it achieved a recognition rate of 98.3%, 99.1%, and 80.2%, respectively, showing 1%-3% improvement compared to the state-of-the-art (SOTA) methods. (C) 2021 Elsevier B.V. All rights reserved.
URI:	https://doi.org/10.1016/j.future.2021.06.045 https://hdl.handle.net/20.500.13091/1008
ISSN:	0167-739X 1872-7115
Appears in Collections:	Mühendislik ve Doğa Bilimleri Fakültesi Koleksiyonu Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collections WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collections

Files in This Item:

File	Size	Format
1-s2.0-S0167739X21002405-main.pdf Until 2030-01-01	2.39 MB	Adobe PDF	View/Open Request a copy

Show full item record

CORE Recommender

SCOPUS^TM
Citations

22

checked on Aug 3, 2024

WEB OF SCIENCE^TM
Citations

116

checked on Aug 3, 2024

Page view(s)

230

checked on Aug 5, 2024

Download(s)

8

checked on Aug 5, 2024

Google Scholar^TM

Check

Files in This Item:

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM