University of Oulu

Zhao, X., Zhou, L., Tong, Y., Qi, Y., Shi, J. (2021). Robust Sound Source Localization Using Convolutional Neural Network Based on Microphone Array. Intelligent Automation & Soft Computing, 30(1), 361–371, https://doi.org/10.32604/iasc.2021.018823

Robust sound source localization using convolutional neural network based on microphone array

Saved in:
Author: Zhao, Xiaoyan1; Zhou, Lin2; Tong, Ying1;
Organizations: 1School of Information and Communication Engineering, Nanjing Institute of Technology, Nanjing, 211167, China
2School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
3University of Oulu, Oulu, 900014, FI, Finland
Format: article
Version: published version
Access: open
Online Access: PDF Full Text (PDF, 0.7 MB)
Persistent link: http://urn.fi/urn:nbn:fi-fe2021121460317
Language: English
Published: Tech Science Press, 2021
Publish Date: 2021-12-14
Description:

Abstract

In order to improve the performance of microphone array-based sound source localization (SSL), a robust SSL algorithm using convolutional neural network (CNN) is proposed in this paper. The Gammatone sub-band steered response power-phase transform (SRP-PHAT) spatial spectrum is adopted as the localization cue due to its feature correlation of consecutive sub-bands. Since CNN has the “weight sharing” characteristics and the advantage of processing tensor data, it is adopted to extract spatial location information from the localization cues. The Gammatone sub-band SRP-PHAT spatial spectrum are calculated through the microphone signals decomposed in frequency domain by Gammatone filters bank. The proposed algorithm takes a two-dimensional feature matrix which is assembled from Gammatone sub-band SRP-PHAT spatial spectrum within a frame as CNN input. Taking the advantage of powerful modeling capability of CNN, the two-dimensional feature matrices in diverse environments are used together to train the CNN model which reflects mapping regularity between the feature matrix and the azimuth of sound source. The estimated azimuth of the testing signal is predicted through the trained CNN model. Experimental results show the superiority of the proposed algorithm in SSL problem, it achieves significantly improved localization performance and capacity of robustness and generality in various acoustic environments.

see all

Series: Intelligent automation & soft computing
ISSN: 1079-8587
ISSN-E: 2326-005X
ISSN-L: 1079-8587
Volume: 30
Issue: 1
Pages: 361 - 371
DOI: 10.32604/iasc.2021.018823
OADOI: https://oadoi.org/10.32604/iasc.2021.018823
Type of Publication: A1 Journal article – refereed
Field of Science: 213 Electronic, automation and communications engineering, electronics
Subjects:
Funding: This work is supported by Nanjing Institute of Technology (NIT) fund for Research Startup Projects of Introduced talents under Grant No. YKJ202019, NIT fund for Doctoral Research Projects under Grant No. ZKJ2020003, the National Nature Science Foundation of China (NSFC) under Grant No. 61571106, NSFC under Grant No. 61703201, Jiangsu Natural Science Foundation under Grant No. BK20170765, Innovation training Program for College Students in Jiangsu Province under Grant No. 202011276110H, and NIT fund for “Challenge Cup” Cultivation support project under Grant No. TZ20190010.
Copyright information: This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  https://creativecommons.org/licenses/by/4.0/