University of Oulu

Xie, W., He, Q., Yu, Z., & Li, Y. (2022). Deep mutual attention network for acoustic scene classification. Digital Signal Processing, 123, 103450.

Deep mutual attention network for acoustic scene classification

Saved in:
Author: Xie, Wei1; He, Qianhua1; Yu, Zitong2;
Organizations: 1School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China
2Center for Machine Vision and Signal Analysis, University of Oulu, Oulu 90014, Finland
Format: article
Version: accepted version
Access: embargoed
Persistent link:
Language: English
Published: Elsevier, 2022
Publish Date: 2024-01-29


Fusion strategies that utilize time-frequency features have achieved superior performance in acoustic scene classification tasks. However, the existing fusion schemes are mainly frameworks that involve different modules for feature learning, fusion, and modeling. These frameworks are prone to introduce artificial interference and thus make it challenging to obtain the system’s best performance. In addition, the lack of adequate information interaction between different features in the existing fusion schemes prevents the learned features from achieving the optimal discriminative ability. To tackle these problems, we design a deep mutual attention network based on the principle of receptive field regularization and the mutual attention mechanism. The proposed network can realize the joint learning and complementary enhancement of multiple time-frequency features end-to-end, which improves features’ learning efficiency and discriminative ability. Experimental results on six publicly available datasets show that the proposed network outperforms almost all state-of-the-art systems regarding classification accuracy.

see all

Series: Digital signal processing
ISSN: 1051-2004
ISSN-E: 1095-4333
ISSN-L: 1051-2004
Volume: 123
Article number: 103450
DOI: 10.1016/j.dsp.2022.103450
Type of Publication: A1 Journal article – refereed
Field of Science: 113 Computer and information sciences
213 Electronic, automation and communications engineering, electronics
Funding: This work was partly supported by the National Natural Science Foundation of China (61771200, 62111530145), an international scientific research collaboration project of Guangdong Province, China (2021A0505030003), and Guangdong Basic and Applied Basic Research Foundation, China (2021A1515011454).
Copyright information: © 2022. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http:/