Deep mutual attention network for acoustic scene classification |
|
Author: | Xie, Wei1; He, Qianhua1; Yu, Zitong2; |
Organizations: |
1School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China 2Center for Machine Vision and Signal Analysis, University of Oulu, Oulu 90014, Finland |
Format: | article |
Version: | accepted version |
Access: | embargoed |
Persistent link: | http://urn.fi/urn:nbn:fi-fe2023041135759 |
Language: | English |
Published: |
Elsevier,
2022
|
Publish Date: | 2024-01-29 |
Description: |
AbstractFusion strategies that utilize time-frequency features have achieved superior performance in acoustic scene classification tasks. However, the existing fusion schemes are mainly frameworks that involve different modules for feature learning, fusion, and modeling. These frameworks are prone to introduce artificial interference and thus make it challenging to obtain the system’s best performance. In addition, the lack of adequate information interaction between different features in the existing fusion schemes prevents the learned features from achieving the optimal discriminative ability. To tackle these problems, we design a deep mutual attention network based on the principle of receptive field regularization and the mutual attention mechanism. The proposed network can realize the joint learning and complementary enhancement of multiple time-frequency features end-to-end, which improves features’ learning efficiency and discriminative ability. Experimental results on six publicly available datasets show that the proposed network outperforms almost all state-of-the-art systems regarding classification accuracy. see all
|
Series: |
Digital signal processing |
ISSN: | 1051-2004 |
ISSN-E: | 1095-4333 |
ISSN-L: | 1051-2004 |
Volume: | 123 |
Article number: | 103450 |
DOI: | 10.1016/j.dsp.2022.103450 |
OADOI: | https://oadoi.org/10.1016/j.dsp.2022.103450 |
Type of Publication: |
A1 Journal article – refereed |
Field of Science: |
113 Computer and information sciences 213 Electronic, automation and communications engineering, electronics |
Subjects: | |
Funding: |
This work was partly supported by the National Natural Science Foundation of China (61771200, 62111530145), an international scientific research collaboration project of Guangdong Province, China (2021A0505030003), and Guangdong Basic and Applied Basic Research Foundation, China (2021A1515011454). |
Copyright information: |
© 2022. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http:/creativecommons.org/licenses/by-nc-nd/4.0/ |
https://creativecommons.org/licenses/by-nc-nd/4.0/ |