A novel wind turbine gearbox fault diagnosis method based on ASO-VMD and NRF

The combination of feature extraction and pattern recognition can make it possible to realize wind turbine gearboxes based on vibration signals. However, these methods need to be constantly adjusted parameters and spend time training when processing different vibration signals, which is time-consuming. Aiming at reducing the number of parameters that need to be adjusted and training time, this paper proposes a variational mode decomposition (VMD) based on atomic search optimization (ASO) and neural random forest (NRF) fault diagnosis model. The parameters of the VMD are adaptively adjusted by the ASO, which has the advantages of less adjustment parameters. After ASO-VMD decomposition, signals will be used as the input of NRF. We evaluate our method on simulation gearbox model which is established by Solidworks and Adams. Experimental results show that our method has faster training speed and higher recognition accuracy without set many parameters manually.


Introduction
In recent years, resource shortages and environmental degradation have prompted countries to focus on the development of clean energy [1]. With the development of technology, wind power generation has developed rapidly, and the installed capacity of global wind turbines has increased year by year [2]. The speed-increasing gearbox is the important rotating component in the wind turbine drive system. If damage to gears is not found in time when it occurs, it will cause huge damage to the wind turbine equipment [3,4].
Time-frequency analysis has been applied to fault diagnosis successfully. Through feature extraction of gearbox vibration signal, the fault information of the gearbox will be extracted. In the field of gearbox vibration signal diagnosis, there are several commonly methods including continuous wavelet transform (CWT) [5,6], Hilbert-Huang transform (HHT) [7,8], empirical mode decomposition (EMD) [9] and local mode decomposition (LMD) [10]. However, CWT depend on the selection of wavelet basis function. When analyze different signal, we need to select different wavelet basis function. In the decomposition process of EMD and LMD, mode mixing and endpoint effect affect the result [11]. Variational mode decomposition (VMD) [12] is a fault adaptive processing method proposed by Dragomireskiy et al. Due to its good anti-noise ability, VMD has been widely used in the field of fault diagnosis [13][14][15][16]. Although VMD has good signal decomposition capability, VMD needs to set more parameters during use. If the parameter settings are unreasonable, the signal decomposition result will be poor. A common solution is to use the parameter optimization method to select the optimal parameters of the VMD. Lv et al. [17] decomposes the fault signal through VMD, it uses the support vector machine (SVM) based on genetic algorithm to identify the fault and improve the generalization ability of the model; Yi et al. [18] use particle swarm optimization (PSO) to find the optimal parameters of VMD to realize Bearing fault diagnosis; Wang et al. [19] use PSO to minimize the average envelope entropy method to optimize the parameters of VMD; J Zhu [20] uses the artificial fish algorithm (AFSA) to find the optimal parameters of VMD to realize the fault diagnosis of rolling bearings; Wang [21] et al. used symbol dynamic entropy and power spectral entropy as fitness functions, and used multi-objective particle swarm optimization (MOPSO) to find the optimal parameters of VMD; Miao et al. [22] used the kurtosis of the indicator set as the objective function. Use the locust optimization algorithm (GOA) to optimize the objective function and select the best VMD parameters. Although the parameter optimization method is used to select the optimal parameters of the VMD, the optimization method itself still needs to set more parameters, and the parameters have a greater impact on the results, which leads to a large number of experiments to determine the optimal parameter range, increasing the difficulty of the experiment. The Atomic Search Optimization Algorithm (ASO) [23] requires fewer parameters to be set and guarantees optimization. At present, ASO is rarely used in the field of mechanical fault diagnosis.
After the signal is decomposed by the VMD, it still contains a variety of vibration information during the operation of the device. Therefore, a suitable pattern recognition method is needed to further determine the type of the fault. Common pattern recognition methods such as support vector machine [24,25], artificial neural network [26][27][28] etc., are widely used in mechanical equipment fault diagnosis, and have achieved remarkable results. Li et al. [29] use a deep belief networks (DBN) to classify the gearbox failure; Chen et al. [30] extracted signals features and use CNN to determine the state of the gearbox. Verma et al. [31] use the extraction of time and frequency features as input for a spare auto-encoder (SAE). Shao et al. [32] use optimize DBN and time-domain features to diagnose faults of bearings. Janssens et al. [33] explored that CNN used the original frequency data to diagnose the bearing seat. The vibration data of the bearing box is pre-processed by Fast Fourier Transform (FFT) and input into CNN to detect faults. However, the above pattern recognition methods all have slow training speeds and are prone to over-fitting problems. Neural random forest (NRF) [34] is a pattern recognition method proposed by Biau et al. in 2016, and NRF is a hybrid method that converts random forest (RF) into a neural network (NN). Compared with RF and NN, NRF requires fewer parameters than standard networks, and there are fewer restrictions on decision geometry than RF.
Aiming at the difficulty of VMD parameter optimization and time-consuming training in fault diagnosis model, this paper proposes a fault diagnosis model based on ASO-VMD and NRF. The ASO is used to select the optimal decomposition parameter of the VMD, under which the original fault signal is decomposed using VMD. The principal component analysis (PCA) is used to perform dimensional compression on the decomposed signal, and finally NRF is used for classification and identification to realize fault diagnosis of the gearbox. Compared with the above method, our proposed method only needs to set two parameters in the process of optimizing VMD. At the same time, in the final fault identification effect, the recognition accuracy of the method reaches 100 %, which can meet the actual fault diagnosis requirements.

Model workflow
Aiming at the gear failure in wind turbine gearbox, this paper proposes an intelligent fault diagnosis model based on ASO-VMD and NRF. The training and working process of the model is shown in Fig. 1.
In this paper, the fault signal of the wind turbine is obtained by Solidworks and Adams simulation [35]. The model built in the simulation is the gearbox of a 1.5 MW wind turbine. Its structure is a set of planetary wheels and two sets of parallel wheels. The collected original simulation signals are divided into several signals. Each segment of the signal is decomposed by ASO-VMD and the total signal fitness is obtained. In this process ASO will optimize the parameters of VMD according to fitness function, which will be described in Section 2.3. The signal decomposed by ASO-VMD has higher dimensions. Before using NRF to classify the signal, we use PCA to reduce the dimensions. Finally, the NRF can accurately identify the gearbox fault status and use other test data to verify the model effect after training. (1) In the VMD decomposition, the signal ( ) updates the IMF center frequency and bandwidth by iteration. Assuming that each eigenmode function ( ) is the finite bandwidth of the center frequency, this variational problem can be transformed into a constrained variational problem that seeks eigenmode function IMFs. The constrained variational model is described as Eqs. (2)(3): . . = . ( In order to find the optimal solution of the above constrained variational problem, the augmented Lagrange function can be constructed by introducing the quadratic penalty factor and the Lagrange multiplication operator ( ) . The Lagrange function is time-frequency transformed, and corresponding solutions are obtained to obtain expressions of the modal function components and , respectively. Then use the alternating direction multiplier algorithm to find the optimal solution of the constrained variational model, and then decompose the original signal into multiple IMFs. JOURNAL OF VIBROENGINEERING. SEPTEMBER 2020, VOLUME 22, ISSUE 6

ASO-based VMD
Inspired by molecular dynamics, ASO achieves the optimal solution of the parameter optimization problem by mathematically simulating the motion of atoms in nature. The ASO initially randomly generates the position of each atom, and the atom updates their position and velocity in each iteration until the best position of the atom is found, which is the optimal solution of the objective function. The acceleration of an atom is determined by two factors: the mutual interaction between atoms and the binding force caused by the bond length potential. The optimal atomic position is taken as the optimal solution for parameter optimization after the end of the iteration. In the ASO algorithm, the position and velocity of each atom are randomly generated first, and the atomic fitness is initialized. Determine the neighborhood of each atom, and determine the neighborhood as defined by Eq. (4): where is the number of iterations, is the number of atoms, and is the total number of iterations. Through the neighborhood, the amount of calculation can be effectively reduced, and the iteration speed can be improved. In each of the original neighborhoods, calculate the force and binding force between them and other atoms, wherein the force calculation formula is as shown in Eqs. (5)(6): where represents the spatial position between two atoms, represents the strength of the interaction, and represents the length scale of the collision diameter. The formula for calculating the binding force is as shown in Eqs. (7)(8): where ( ) is the Lagrangian multiplication, ( ) is the optimal atomic position in each iteration, and is the fixed length between the th atom and the best atom. The force of each atom is iterated to calculate the acceleration of each atom, and the position of each atom is updated by the acceleration. After each iteration is completed, the fitness value is calculated. After the end of the iteration, the minimum fitness value is selected as the optimal solution, and the atomic position and variables at the optimal time are calculated.
When using VMD decomposition, parameters need to be set according to prior experience, but the signal in the gearbox is complex, and setting parameters according to experience cannot ensure that the VMD can accurately identify the fault features. Therefore, an atomic optimization algorithm is used to select the optimal parameters. The parameters to be selected in the VMD are the penalty factor and the number of narrowband modal components. Each group of signals is decomposed by VMD to obtain IMFs, and the information difference coefficient between the IMFs is calculated, as in Eqs. (9)(10) shown: where represents the information entropy of each IMF. The error coefficient e of the initial signal and the reconstructed signal is defined as shown in Eq. (11): where denotes the initial signal, and indicates the decomposed signal. The VMD decomposition fitness is expressed as shown in Eq. (12): The larger the value, the greater the amount of information contained in each IMF. The larger the value, the more information the IMF contains. The smaller the value, the more similar the reconstructed signal is to the original signal. Since the fault information contained in each group of signals in the data set is inconsistent, the fitness of the gearbox in each state is added, as shown in Eq. (13): is the total fitness of the signal, and is the total number of signals. Taking this as the objective function of the atomic optimization algorithm, the minimum value of and the and at this time are obtained through multiple iterations.

PCA
Gearbox fault signals still have a high dimension after VMD decomposition, which is not conducive to subsequent signal characteristic analysis. Therefore, PCA is used to compress high-dimensional data and retain feature points that have a large impact on the results. The principle of PCA is to transform the original data in high-dimensional space to obtain the transformation direction with the largest variance, so as to achieve dimensional compression. Before using PCA, the time domain signals in each state are composed into a matrix : where is the number of samples and is the number of features. Normalize each feature point in the sample and calculate the correlation coefficient matrix, as Eqs. (15)(16) shown: Finally calculate the special diagnosis vector of the correlation coefficient matrix, as in Eq. (17) shown: After the feature vector is obtained, the first 1024 features of the cumulative variance contribution rate are selected as the main indicators for subsequent analysis.

Basic principles of NRF
Based on the decision tree algorithm, the random forest combines multiple decision trees at the same time, and combines the results of multiple decision trees with the least empirical error method to obtain the optimal results. The literature [34] pointed out that the decision tree can be seen as a two-layer neural network model. For each layer of the neural network, its activation function is shown in Eq. (18): where ( ) = 21 001 is the threshold activation function. The weight and paranoia of each layer are related to the decision tree model. The final output of the neural network is shown in Eq. (19): The conversion relationship between decision tree and neural network is shown in Fig. 2.
In a random forest, the results of all decision trees will be aggregated to form a forest assessment, as shown in Eq.
After replacing the decision tree with a neural network, all the results are summarized by a random forest method, and finally a neural random forest model is formed.

Wind turbine gearbox modeling
In order to verify the effectiveness of the ASO-VMD and NRF fault diagnosis models. We establish the wind turbine gearbox simulation model and collect the vibration signal of the gearbox during operation.

Gearbox modeling
Large wind turbine gearboxes are generally composed of planetary gear trains and parallel gear trains. In 1.5 MW wind turbine gearboxes, the common structure is composed of a set planet gears and two sets parallel gears, which can reduce the risk of gearbox failure. According to references [36], the input speed of the wind turbine gearbox is generally 10-30 r/min, and the output speed is generally 1500 r/min. Based on the above situation, the total gear ratio of our simulated gearbox is 100.8, and the parallel gears consist of helical gears. The gear parameters of each stage are shown in Table 1. The gearbox model is established by Solidworks. In order to simplify the calculation of the simulation, the part between the transmission shaft and the gear transmission is omitted in the model. The modeling result of the gearbox is shown in Fig. 3.

Fault simulation
The fault types of the gears in the wind turbine gearbox generally include broken teeth, pitting, and cracks. Among them, gear cracks and broken teeth cause more damage to the gearbox. If it cannot be found in time, it will have a huge impact on the normal operation of the wind turbine. In order to find the vibration characteristics of the gearbox when gear cracks or broken teeth, dynamic simulation of the gearbox is required. The gearbox 3D model is imported into Adams, and the basic constraints and flexibility settings are made for each gear.
A contact force is set between the respective gears, and the input speed of the gearbox is set to 0.66 π/s, a resistance of 6000 N⋅m is set at the output end. In the Adams simulation setup, the simulation time is set to 5 s, the number of simulation steps is 20,000 steps, and the damping factor of the flexible body is set to 50. The meshing frequency of each gear can be calculated by combining the above simulation conditions. The results are shown in Table 2. Firstly, we verify the gears ratio of the gearbox. The output speed is shown in Fig. 4. It can be seen from the figure that the system enters a steady state after about 0.2 s. The average angular velocity is 12000 °/s and the range of fluctuation of the speed is less than 100 °/s, which satisfies the characteristics of the periodic meshing impact of the gear and conforms to the display. In order to reduce the interference of other vibration source, the vibration sensor is generally placed at the input and output ends of the gearbox in the actual working environment [37,38]. Therefore, the vibration signal of the output gear is collected in the Adams simulation environment, and the acceleration time domain signal and the frequency domain signal are shown in Fig. 5. It can be seen from the frequency domain diagram that frequency corresponding to amplitude is very close to the theoretical gear frequency, thus verifying the validity of the model.
In order to verify that the model can simulate the vibration characteristics of the gearbox failure, the normal gears in the model are replaced with gears of different fault types. In the test, the large gears in the sun gear and the parallel gears train 2 are replaced by broken gears. The models of the fault gears are shown in Fig. 6. The crack is set to have a split width of 2.5 mm and a depth of 3 mm.   Hz. This is close as the parallel gear train 2 which the gear rotation frequency is 9 Hz. In the frequency domain signal, the meshing frequency of the parallel gear train 2 basically matches the signal amplitude in the frequency domain diagram. At the same time, compared with the frequency domain diagram under normal conditions, the amplitude at the meshing frequency has increased significantly. This can further verify the accuracy of the simulation model. On this basis, the vibration signals of the gear box body and the output end in other fault states are collected separately as the input of the fault diagnosis model. The data amount in the state of each gear box will be described in detail in Section 4.1.

Fault diagnosis model establishment
The workflow of the wind turbine gearbox fault diagnosis model has been described in detail in the previous section. The detailed parameters of each part in the model are described below.
(1) The simulation time of the gearbox is 5 s, and the number of sampling points is 20,000. There are two sets of vibration data in the same state: ring gear vibration data and output vibration data. The data will be divided into several groups by sliding method, the length of each group is 4096. Each set of data has a size of 2×4096. The data set includes the normal state of the gearbox, the single gear broken teeth, and the crack fault state. Multiple gears have multiple fault conditions at the same time. The total number of samples divided into total is 575 groups. The data set is divided as shown in Fig. 8. (2) The number of atoms in the ASO is set to 10, the maximum number of iterations is set to 50, the search range of is [2,10], and the search range of [100, 6000] s.
(3) Select the optimal parameters obtained by the optimization algorithm, and decompose the samples with VMD. After each group of data is decomposed, a feature matrix of ( , 2 × 4096) is obtained, where m is the number of samples. The feature matrix is dimensionally compressed by PCA, and the effective features are extracted. The dimension of the feature matrix after compression by principal component analysis is ( , 1024).
(4) Put the special diagnosis matrix into the nerve random forest for training. 50 % of the data in each state of the data set is used as the training set, 25 % is used as the verification set, and 25 % is used as the test set. The number of data sets in each state is shown in Table 3. In the NRF, a total of 30 class decision trees are set. The neural network has two hidden layers, of which the first layer contains 20 nodes and the second layer contains 10 nodes. The maximum number of iterations of the model is 100.

ASO-VMD decomposition result
In order to check the effectiveness of the ASO-VMD algorithm, the method is compared with the PSO-VMD [18] and AFSA-VMD [20] method. PSO needs to set parameters such as the number of particles, the inertia factor, and the maximum flying speed. AFSA needs to set parameters such as the number of artificial fish, the maximum number of trials, the perceived distance, and the congestion factor. Compared to PSO and AFSA, ASO only needs to set the initial number of atoms and the number of iterations to achieve VMD optimization, which minimizes the impact of setting parameters on the results. As shown in Table 4, different VMD optimization parameters are obtained by different optimization algorithms. The PSO and AFSA parameters are set to be basically the same as the ASO to improve the experimental contrast. It can be seen from the table that compared with the PSO-VMD and AFSA-VMD, the ASO-VMD processed signal has a better correlation with the original signal, and the fault features are decomposed while maximally retaining the signal. Information, while ASO-VMD has a better fitness value than PSO-VMD and AFSA-VMD. As shown in Fig. 9, the iterative process of ASO-VMD shows that ASO-VMD has found the optimal parameters in about 20 rounds. In PSO-VMD, the fitness value in the iterative process is at least about 17, the minimum value of AFSA-VMD is only 25.64. It can be seen that ASO-VMD has better ability to prevent local minimization, and its optimization result is better. Fig. 10 shows the positional changes of 10 atoms in the 50-pass iteration of ASO. It can be seen from the figure that the atomic distribution is scattered at the initial iteration, ensuring that a large range of space can be searched. As the iterative process progresses, the atoms gradually enter an equilibrium state, the distribution of atoms begins to concentrate, eventually reaching a stable range.
The time domain map of each IMF after decomposition by ASO-VMD is shown in Fig. 11. It can be seen from the figure that the characteristics of each IMF time domain signal are obvious. Since each IMF represents a part of the vibration information in the original signal, the original signal contains the vibration information of the original signal after being decomposed by the ASO-VMD. It is decomposed into each IMF, making the individual vibration components of the original signal easier to identify, which is beneficial to the use of subsequent fault identification methods.

Fault identification verification
From the decomposition effect point of view, the effect of ASO-VMD and PSO-VMD is closer, so we put the signals decomposed by ASO-VMD and PSO-VMD into NRF for training. At the same time, the same neural network model and NRF are compared. The comparison indicators include model accuracy and root mean square error. The training results are shown in Table 5. From the table we can see that the signal decomposed by ASO-VMD has higher accuracy and smaller root mean square error under each recognition model. At the same time, under the same parameter configuration, both NRF and RF have a higher recognition rate after rounding the value of the network output. However, from the perspective of root mean square error, the NRF output is closer to the actual value, and the model has a more stable and accurate output.
The accuracy rate changes between different models in the training process are compared, and the variation curves are shown in Fig. 12 It can be seen from the figure that during the training process, the initial error of the neural network is large. After 60 iterations, the error is smaller than the random forest of the nerve, and the training set achieves better results. However, in the verification set, although the neural network realized the gradient decline in the previous rounds of iteration, the root mean square is always stable at around 20, so it can be seen that the neural network has experienced a serious over-fitting situation. However, the neural randomization still maintains a low error in the verification set. It can be seen that the NRF forest has faster training speed and better test results than the neural network. Our method is mainly composed of decomposition algorithm and classification algorithm, both of which directly have different methods to replace. The accuracy of data classification by the combination if different methods is shown in Table 6. We have manually tuned all models to achieve the best results. Among the existing fault diagnosis models, there are pattern recognition technologies such as CNN to achieve end-to-end fault diagnosis [27]. We also compared the end-to-end fault diagnosis model. From Table 6 we can see that ASO-VMD-NRF can achieve the best results. The VMD based on the center frequency method adjusts parameters cannot achieve the best decomposition effect for all signals, resulting in a decrease in accuracy. In the ASO-VMD-CNN model, some of the decomposed signal features are similar. Due to the poor generalization ability of CNN compared to NRF, a wrong judgment is generated.

Conclusions
This paper presents a gearbox fault diagnosis model and tests the effect of the model in a simulation environment. The model has the following characteristics: 1) ASO can realize the VSD adaptive decomposition signal, reduce the subjective error caused by human adjustment, and the ASO needs less adjustment parameters, which reduces the difficulty caused by adjusting parameters. The experimental results show that the ASO-VMD can effectively remove the noise in the signal and preserve the effective information components in the signal to the greatest extent.
2) NRF has faster training speed and lower recognition error than neural network or random forest. Under the same parameter structure, NRF can achieve more stable recognition accuracy while avoiding over-fitting.
3) ASO-VMD and NRF as an adaptive fault diagnosis model can accurately determine the type of gearbox failure. This paper verifies the effectiveness of the model through the gearbox simulation model. It provides a stable and accurate solution for wind turbine gear fault diagnosis.
In the next work, we will further expand the model's fault identification capability under non-steady state and study the model's fault identification capability under dynamic load environment.