University of Oulu

L. Zhou, X. Chen, C. Wu, Q. Zhong, X. Cheng et al., "Speech enhancement via mask-mapping based residual dense network," Computers, Materials & Continua, vol. 74, no.1, pp. 1259–1277, 2023.

Speech enhancement via mask-mapping based residual dense network

Saved in:
Author: Zhou, Lin1; Chen, Xijin1; Wu, Chaoyan1;
Organizations: 1School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
2Center for Machine Vision and Signal Analysis, University of Oulu, Oulu, FI-90014, Finland
3College of IOT Engineering, Hohai University, Changzhou, 213022, China
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 1.7 MB)
Persistent link:
Language: English
Published: Tech Science Press, 2022
Publish Date: 2023-08-16


Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network (DNN). But the mapping-based methods only utilizes the phase of noisy speech, which limits the upper bound of speech enhancement performance. Masking-based methods need to accurately estimate the masking which is still the key problem. Combining the advantages of above two types of methods, this paper proposes the speech enhancement algorithm MM-RDN (masking-mapping residual dense network) based on masking-mapping (MM) and residual dense network (RDN). Using the logarithmic power spectrogram (LPS) of consecutive frames, MM estimates the ideal ratio masking (IRM) matrix of consecutive frames. RDN can make full use of feature maps of all layers. Meanwhile, using the global residual learning to combine the shallow features and deep features, RDN obtains the global dense features from the LPS, thereby improves estimated accuracy of the IRM matrix. Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments. Specifically, in the untrained acoustic test with limited priors, e.g., unmatched signal-to-noise ratio (SNR) and unmatched noise category, MM-RDN can still outperform the existing convolutional recurrent network (CRN) method in the measures of perceptual evaluation of speech quality (PESQ) and other evaluation indexes. It indicates that the proposed algorithm is more generalized in untrained conditions.

see all

Series: Computers, materials & continua
ISSN: 1546-2218
ISSN-E: 1546-2226
ISSN-L: 1546-2218
Volume: 74
Issue: 1
Pages: 1259 - 1277
DOI: 10.32604/cmc.2023.027379
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Funding: This work is supported by the National Key Research and Development Program of China under Grant 2020YFC2004003 and Grant 2020YFC2004002, and the National Nature Science Foundation of China (NSFC) under Grant No. 61571106.
Copyright information: © The Author(s) 2022. This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.