A method of sound source recognition based on tow weak selection compressed sensing

The current compression sensing sound source recognition algorithm is mainly based on prior knowledge of exact sparsity. This paper proposes the two stagewise weak orthogonal matching pursuit (TSWOMP) compressed sensing reconstruction algorithm for sound source recognition, which selects atoms and detects the reliability of the selected atoms, and deletes the wrong atoms to obtain the final set of atoms. The results of TSWOMP, stagewise weak orthogonal matching pursuit (SWOMP) and conventional beamforming under different frequencies and different SNR are compared through numerical simulation methods. The results show that TSWOMP can effectively reduce the width of the main lobe, suppressing the influence of the side lobes, and improve the resolution. The recognition accuracy of TSWOMP is significantly higher than that of SWOMP. In addition, it can effectively reduce the algorithm's dependence on sparsity.


Introduction
At present, the main way to sound sources localization at home and abroad is to collect sound signals based on the microphone array. The main methods used include delay-sum [1], deconvolution approach to the mapping of acoustic sources [2], near-field acoustical holography [3], and compressed sensing sound sources localization [4]. The sound source imaging method of compressed sensing technology utilizes the sparsity of the sound source to greatly compress the data by projection to the low-dimensional space, and the original sound source signals can be recovered from fewer signal data by using the compressed sensing reconstruction algorithm, which can effectively improve the accuracy and quality of sound source imaging.
In 2006, Donoho et al. [5] proposed the theory of compressed sensing. In the field of sound source localization, many scholars have introduced compressed sensing technology into sound source recognition [6]. The key to compressed sensing is the reconstruction algorithm, which reconstructs the original signal by solving a convex optimization problem. The existing reconstruction methods mainly fall into four categories: greedy pursuit algorithm [7], convex optimization algorithm [8], Bayesian framework algorithm [9], and non-convex optimization algorithm [10]. The greedy pursuit algorithm has been widely concerned and studied for its high reconstruction efficiency and accuracy. The main idea is to select the atoms the most relevant to the observed data and reconstruct the optimal approximation of the observed data onto iteration.
Based on the above content, this paper proposes a two stagewise weak orthogonal matching pursuit (TSWOMP) compressed sensing reconstruction algorithm applied to sound source recognition. The initial atom candidate set is selected through the segmented weak selection criterion. At the end of each iteration, the reliability of the selected atom is checked, and the previously selected wrong atom is deleted from the current atom set to obtain the final result of the iteration [11]. The TSWOMP algorithm can break through the Nyquist sampling law, solving the problem of excessive dependence on prior conditions, and achieve the purpose of simple algorithm steps, small calculations to improving the accuracy and quality of sound source imaging.

The acoustic field model
Assuming that the array is planar, the total number of microphones in the array are , and the frequency of the sound source is , the sound source surface is divided into grid nodes with the same number of row and column nodes, and there are a total of grid nodes. The distance from the m microphone to the n grid node is denoted as ( = 1, 2... , = 1, 2... ). According to the Helmholtz integral equation of the free field Green's function [12], the transfer matrix between the microphone array and the sound source plane node is established, as shown in below Eq. (1): where is the speed of sound, ̄ is the coordinate of the m microphone, is the coordinate of the grid node. Assume that the sound source signal located on the sound source plane can be expressed as = ( , . . . , ), then the microphone measurement value can be expressed as = . However, there is some noise in the practical application environment, so the measured value of the microphone is expressed as = + , where represents noise.

Reconstruction algorithm
The goal of compressed sensing theory is to reconstruct signal of sparse sound source by measurement value and measurement matrix . The above problem is to solve the underdetermined linear equations, which usually have multiple solutions, owing to ≤ . When is a sparse sound source signal, solving can be regarded as the process of seeking the sparsiest solution, which is the process of solving the norm. However, this is a NP-hard problem, which is usually difficult to calculate. In order to solve this problem, Candes and Donoho et al. [13] proposed the idea of replacing norm with norm, so the problem can be expressed as a constraint problem in below Eq. (3), where is determined by the given noise: Donoho et al. have shown that the solution of norm and the solution of norm are approximately equal under RIP. Greedy tracking algorithm, as a method of compressed sensing reconstruction algorithms, has been widely concerned and researched due to its high reconstruction efficiency and precision. The main algorithms include orthogonal matching pursuit (OMP) [14], stagewise orthogonal matching pursuit (STOMP) [15] and SWOMP [16], where SWOMP has a good performance in terms of complexity and precision. The SWOMP algorithm iterates to the final result by removing the wrong atoms from the set of atoms using a threshold.

Basis of two choices
In the compressed sensing reconstruction algorithm, the SWOMP simplifies OMP and improves the calculation speed. However, the atoms obtained in each iteration are not the best representation of the signal, which leads to a decrease in the reconstruction precision. In practice, the accuracy of the reconstructed signal is far inferior to that of OMP. In addition, the SWOMP can accurately reconstruct the original signal only when the sparsity of the signal is correctly estimated.
To solve the above problems, literature [17,18] proposed an improved algorithm that relied too heavily on the sparsity, which selected the initial atomic set through the piecewise weak selection criteria, then tested the reliability of the previously selected atoms, and deleted the previously selected wrong atoms from the current atomic set, so as to obtain the final result through iteration. In this paper, the TSWOMP compressed sensing sound source reconstruction algorithm is proposed, which can make the algorithm no longer need the accurate information of sparsity with higher reconstruction accuracy and lower complexity. Ideally, the estimate of the result of the previous step should be greater than the estimate of the result of this step. If, in the process of an iteration, the estimated value of the result obtained according to the first piecewise weak selection criterion is much smaller than the estimated value of the result obtained before, it is very likely that the wrong atom was selected in the previous selection. Therefore, it is necessary to design a "second segmenting weak selection" standard to remove the previously selected wrong atoms from the current atom support set. The second weak selection parameter number ∈ (0,1) is introduced, and the second piecewise weak selection criterion is shown in Eq. (4): where ∈ , is the index of the deleted set of the second weak selection, and is the results obtained before and after the first piecewise weak selection in the -th iteration, respectively.

TSWOMP algorithm specific steps
According to SWOMP and the above theoretical analysis, the specific steps of TSWOMP are obtained. The sound source signal can be obtained by sensing matrix and measuring signal . Step 4: Find the least squares solution of = and = , and delete the values that satisfy | | < , the column number of these values corresponding to is , update Λ = Λ \ and .
Step 8: The obtained by the reconstruction has a non-zero term at Λ , and the results are obtained by the last iteration respectively, and the rest positions are 0.

Numerical simulation
In order to verify the feasibility and advantages of the proposed method, the following simulation process adopts CBF, SWOMP and TSWOMP respectively. Based on the simulation results, the influence of frequency and signal-to-noise ratio on sound source localization is analyzed, and the accuracy of sound source identification is analyzed. It is assumed that the sound source is located on the measuring surface, its coordinate is (0, 0, 0), the dynamic range of the sound pressure level is 30 dB, an 8×10 microphone array is adopted, the grid points on the sound source surface are 21×21, the grid spacing is 0.025 m, and the distance from the microphone array to the sound source surface = 0.25 m.
After several simulations, TSWOMP can obtain a better reconstruction effect when the weak selection parameters and are 0.5-1, so this simulation takes = 0.6 and = 0.8. The threshold parameter of SWOMP is set as = 0.9. The number of iterations is set as = 10.

The influence of frequency on identification results
The CBF, SWOMP and TSWOMP are used to locate and simulate the sound source at the frequency of 800 Hz and 2500 Hz respectively, and the SNR was 30 dB. The results are shown in Fig. 1 and Fig. 2 The frequency in Fig. 1 is 800 Hz, and the CBF, SWOMP and TSWOMP can accurately locate the location of the sound source, but there is a large error between the sound pressure values identified by the three algorithms and the sound source pressure values. The main lobe width of CBF and SWOMP are much larger than that of TSWOMP. It can be seen that TSWOMP can effectively reduce the main lobe width, suppressing the influence of side lobe, and significantly improve the resolution at low frequency. The frequency in Fig. 2 is 2500 Hz, and TSWOMP can accurately locate the location and sound pressure of the sound source. The CBF localization results of the main lobe width are much larger than the other two algorithms. Although SWOMP can reduce the width of the main lobe, there are many side lobes and pseudo sound sources, and there is a large error between the recognized sound pressure value and the sound source value. By comparing Fig. 1 and Fig. 2, it is found that with the decrease of frequency, TSWOMP has a certain recognition error in the case of low frequency.

The influence of SNR on recognition results
A sound source with a frequency of 5000 Hz, the SNR of 15 dB and 0 dB, and a distance of 0.25m from the array to the measuring surface was selected. The CBF, SWOMP and TSWOMP were used for simulation experiments. The results are shown in Fig. 3 and Fig. 4. The SNR is 15 dB in Fig. 3, and the sound pressure identified by CBF and SWOMP has a large error with the sound source pressure value, while TSWOMP can accurately identify the sound source location and sound source pressure value. Compared with CBF and SWOMP, TSWOMP has smaller main lobe width, eliminating the influence of side lobe, and reducing the error of sound pressure value reduction. By comparing Fig. 3 and Fig. 4, it can be seen that with the reduction of SNR, CBF and SWOMP are greatly affected by noise, where the main lobe width becomes larger, and the number of side lobes in positioning results increases. However, TSWOMP can effectively reduce the main lobe width, suppressing the influence of side lobes, and improve the accuracy of sound pressure identification of sound source the low SNR.

Accuracy analysis of identification results
In order to verify the accuracy of TSWOMP recognition results, a frequency of 6500 Hz, a signal to noise ratio of 30 dB, and a distance of 0.25 m from the array to the measuring surface of 100 arbitrary sound sources were taken, then TSWOMP and SWOMP were used for simulation experiments. The results are shown in Fig. 5.

Conclusions
The TSWOMP algorithm can break through the Nyquist sampling law, solving the problem of excessive dependence on prior conditions. Compared with CBF and SWOMP, TSWOMP can effectively reduce the main lobe width and inhibit the influence of side lobe, which significantly improves the resolution, and the sound pressure recovered is more equal to the sound pressure of the sound source at the same frequency. In addition, TSWOMP can also achieve very good results at low SNR. However, TSWOMP has a certain recognition error at the low frequency. Compared with SWOMP, TSWOMP can effectively reduce the dependence on sparsity information, and the recognition results are more stable and accurate. However, the TSWOMP has some errors in the identification results of individual locations of the sound source. Therefore, further research on this problem can be carried out in the follow-up work.