University of Oulu

Ye, L.; Liu, T.; Han, T.; Ferdinando, H.; Seppänen, T.; Alasaarela, E. Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences. Remote Sens. 2021, 13, 628. https://doi.org/10.3390/rs13040628

Campus violence detection based on artificial intelligent interpretation of surveillance video sequences

Saved in:
Author: Ye, Liang1,2,3; Liu, Tong1,4; Han, Tian2,5;
Organizations: 1Department of Information and Communication Engineering, Harbin Institute of Technology, Harbin 150001, China
2OPEM Research Unit, University of Oulu, 90014 Oulu, Finland
3Science and Technology on Communication Networks Laboratory, Shijiazhuang 050000, China
4ChinaUnicom Software Harbin Branch, Harbin 150001, China
5Jinhua Advanced Research Institute, Jinhua 321000, China
6Department of Electrical Engineering, Petra Christian University, Surabaya 60236, Indonesia
7Physiological Signal Analysis Team, University of Oulu, 90014 Oulu, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 3 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe202103298633
Language: English
Published: Multidisciplinary Digital Publishing Institute, 2021
Publish Date: 2021-03-29
Description:

Abstract

Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.

see all

Series: Remote sensing
ISSN: 2072-4292
ISSN-E: 2072-4292
ISSN-L: 2072-4292
Volume: 13
Issue: 4
Article number: 628
DOI: 10.3390/rs13040628
OADOI: https://oadoi.org/10.3390/rs13040628
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: This research was funded by National Natural Science Foundation of China, grant number 41861134010; Key Laboratory of Information Transmission and Distribution Technology of Communication Network, grant number HHX20641X002; National Key R&D Program of China (No.2018YFC0807101); Basic scientific research project of Heilongjiang Province, grant number KJCXZD201704; and Finnish Cultural Foundation, North-Ostrobothnia Regional Fund 2017. The APC was funded by National Natural Science Foundation of China, grant number 41861134010, and National Key R&D Program of China, grant number 2018YFC0807101.
Copyright information: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
  https://creativecommons.org/licenses/by/4.0/