Deep mutual attention network for acoustic scene classification
Xie, Wei; He, Qianhua; Yu, Zitong; Li, Yanxiong (2022-01-29)
Xie, W., He, Q., Yu, Z., & Li, Y. (2022). Deep mutual attention network for acoustic scene classification. Digital Signal Processing, 123, 103450. https://doi.org/10.1016/j.dsp.2022.103450
© 2022. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http:/creativecommons.org/licenses/by-nc-nd/4.0/
https://creativecommons.org/licenses/by-nc-nd/4.0/
https://urn.fi/URN:NBN:fi-fe2023041135759
Tiivistelmä
Abstract
Fusion strategies that utilize time-frequency features have achieved superior performance in acoustic scene classification tasks. However, the existing fusion schemes are mainly frameworks that involve different modules for feature learning, fusion, and modeling. These frameworks are prone to introduce artificial interference and thus make it challenging to obtain the system’s best performance. In addition, the lack of adequate information interaction between different features in the existing fusion schemes prevents the learned features from achieving the optimal discriminative ability. To tackle these problems, we design a deep mutual attention network based on the principle of receptive field regularization and the mutual attention mechanism. The proposed network can realize the joint learning and complementary enhancement of multiple time-frequency features end-to-end, which improves features’ learning efficiency and discriminative ability. Experimental results on six publicly available datasets show that the proposed network outperforms almost all state-of-the-art systems regarding classification accuracy.
Kokoelmat
- Avoin saatavuus [31657]