Vibration-based gearbox fault diagnosis using deep neural networks

Vibration-based analysis is the most commonly used technique to monitor the condition of gearboxes. Accurate classification of these vibration signals collected from gearbox is helpful for the gearbox fault diagnosis. In recent years, deep neural networks are becoming a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. In this paper, a study of deep neural networks for fault diagnosis in gearbox is presented. Four classic deep neural networks (Auto-encoders, Restricted Boltzmann Machines, Deep Boltzmann Machines and Deep Belief Networks) are employed as the classifier to classify and identify the fault conditions of gearbox. To sufficiently validate the deep neural networks diagnosis system is highly effective and reliable, herein three types of data sets based on the health condition of two rotating mechanical systems are prepared and tested. Each signal obtained includes the information of several basic gear or bearing faults. Totally 62 data sets are used to test and train the proposed gearbox diagnosis systems. Corresponding to each vibration signal, 256 features from both time and frequency domain are selected as input parameters for deep neural networks. The accuracy achieved indicates that the presented deep neural networks are highly reliable and effective in fault diagnosis of gearbox.


Introduction
Industrial environments have constantly increasing requirements for the continuous working of transmission machines.That is why new proposals for building fault diagnostic systems with low complexity and adequate accuracy are highly valuable [1].As one of the core components in rotary machinery, gearbox is widely employed to deliver torque or provide speed conversions from rotating power sources to other devices [2].Identifying gearbox damage categories, especially early faults and combined faults, is an effective way to avoid fatal breakdowns of machines and prevent loss of production and human casualties.The vibration signals during the run-up and run-down periods of a gearbox contain a wealth of condition information [3].Vibration-based analysis is the most commonly used technique to monitor the condition of gearboxes.
In gear fault diagnosis, several analysis techniques have been used, such as wavelet transform [4,5], group sparse representation [6], multiscale clustered grey infogram [3], and generalized synchrosqueezing transform [7].The availability of an important number of condition parameters that are extracted from gearbox signals, such as vibration signals, has motivated the use of machine learning-based fault diagnosis, where common approaches use support vector machine [8,9], neural networks (NN) [10][11][12][13] and their related models, because of the simplicity for developing industrial applications.
The SVM family received good results in comparison with the peer classifiers [14].In [13], a comparison study was conducted on three types of neural networks: feedforward back-propagation (FFBP) artificial neural network, functional link network (FLN) and learn vector quantization (LVQ).The study achieved good results with FFBP for the classification of three faults at different rotation frequencies.However, as Y. Bengio reported in [15,16], the gradient-based training of supervised multi-layer neural networks (starting from random initialization) gets easily stuck in "apparent local minima or plateaus", which is to restrict its application in more complex gearbox fault diagnosis.
In recent years, deep neural networks are becoming a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data [17].Since 2006, deep learning networks such as Restricted Boltzmann Machine (RBM) [18], Deep Belief Networks (DBN) [19] have been applied with success in classification tasks and other fields such as in regression, dimensionality reduction, and modeling textures [20].Some reports showed that the deep learning techniques have been applied for the fault diagnosis with commonly one modality feature.Tran et al. [21] suggested a DBN-based application to diagnose reciprocating compressor valves.Tamilselvan and Wang [22] employed the deep belief learning for health state classification of iris dataset, wine dataset, Wisconsin breast cancer diagnosis dataset, and Escherichia coli dataset.C. Li et al [23] proposed multimodal deep support vector classification for gearbox fault diagnosis, where Gaussian-Bernoulli deep Boltzmann machines (GDBMs) were used to extract the feature of the vibration and acoustic signal in time, frequency and wavelet modalities, respectively; and then the extracted features are integrated for fault diagnosis using GDBMs.Li's research [23] indicated that Gaussian-Bernoulli deep Boltzmann machine is effective for the gearbox fault diagnosis.We have presented a multi-layer neural network (MLNN) for gearbox fault diagnosis (MLNN DBN ) [24], where the weights of deep belief network are used to initialize the weights of the constructed MLNN.Experiment results showed MLNN DBN was an effective fault diagnosis approach of gearbox.However, data sets were only collected from an experimental rig, which only included 12 kinds of condition parameters.
There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions [25].In this work, basing on the time-domain and frequency-domain features extracted from vibration signal, we evaluated the performance of four classical deep neural networks (Auto-encoders, Restricted Boltzmann Machines, Deep Boltzmann Machines and Deep Belief Networks) for gearbox fault diagnosis.In the existing researches of intelligent gearbox fault diagnosis systems, their experimental data sets were usually obtained from a simple experimental rig, where a signal only corresponds to one type of gear or bearing fault, and one data set only involves the classification of several fault condition patterns.As a result, it is insufficient to validate the generalization of an intelligent diagnosis system.To ensure that the proposed diagnosis systems are highly effective and reliable in fault diagnosis of industrial reciprocating machinery, three types of data sets based on the health condition of two rotating mechanical systems are prepared and tested in our study.Each signal obtained includes the information of several basic gear or bearing faults.Totally 62 data sets are used to test and train the proposed gearbox diagnosis systems.
The rest of this paper is constructed as follows.Section 2 introduces the adoptive methodologies including Auto-encoders, Restricted Boltzmann Machines, Deep Boltzmann Machines and Deep Belief Networks; Section 3 covers feature representation of vibration signals; Section 4 presents the implementation of the classifier based on deep neural networks; Section 5 is an introduction of experimental setup; Results and discussion are presented in Section 6; The conclusions of this work are given at the end.

Deep neural networks
The essence of deep neural networks (DNN) is to build neural network by imitating the hierarchical structure of human visual mechanism and brain to analyze and learn things.By establishing machine learning model with multiple hidden layers and using a sea of training data, deep neural network is to learn more useful features so as to improve the accuracy of classification and prediction.Compared with traditional shallow learning, the distinctiveness of deep neural network lies in that: (1) it emphasizes the depth of model structure which usually has hidden layer nodes of five layers, six layers, or even over ten layers; (2) it explicitly highlights the importance of feature learning, that is, to transform the feature expression of the sample from the original space to a new feature space via feature shifts layer by layer, thereby making classification or predictions easier.Compared with the method of regular artificial configuration, using big data to learn feature may better depict the abundant inner information of data.
The training mechanism of deep neural network includes two stages: the first stage is to use bottom-up unsupervised learning.This process can be regarded as a process of feature learning.The second stage is to use top-down supervised learning, which usually applies the gradient descent method to fine-tune the whole network parameters.The fundamental steps are given as follows: Step 1: Build neurons layer by layer.For any two neighboring layers, suppose the input layer is the lower layer while the other layer is the upper layer.The connection weights between layers include cognitive weights upward from the lower layer to the upper one and the generative weights from the upper layer to the lower one.The cognitive process upward is actually the encoding stage (Encoder), which is to extract feature (Code) from the bottom to the top.The reconstruction downward is actually the decoding stage (Decoder), which is to rebuild information for the abstract expression and the generative weights.
Step 2: Adjust parameters layer by layer based on the wake-sleep algorithm.This process is for feature learning in which the parameters in one layer are adjusted.
Step 3: Apply top-down supervised learning.This step is to add a classifier (such as Logistic Regression, SVM, etc.) at the top encoding layer based on the parameters of each layer acquired through learning of the second step.Then apply gradient descent method to fine-tune the whole network parameters through data-labeled supervised learning.
In the following subsections, four commonly-used deep neural networks, Restricted Boltzmann Machine (RBM), Deep Boltzmann Machine (DBM), Deep Belief Networks (DBN) and Stack Auto-encoders (SAE) will be briefly discussed.For more details, please refer to the relevant literature [18,19,26,27].

Restricted Boltzmann machine
The restricted Boltzmann machine is a generative stochastic artificial neural network with two layers as shown in Fig. 1, which can learn a probability distribution over its set of inputs.The standard RBM has binary-valued hidden and visible units, and consists of a matrix of weights associated with the connection between hidden unit and visible unit.Given these, an energy function of the configuration ( , ℎ) is defined as following [18]: where and denote the visible and the hidden neurons, and stand for their offsets, and = { , , } is network parameters.To accommodate the real-valued input data, Salakhutdinov et al. [28] proposed Gaussian-Bernoulli RBM (GRBM), where the binary visible neurons can be replaced by the Gaussian ones.The energy function is redefined as the following: where is the standard deviation associated with Gaussian visible neuron .The statistical parameters for the fault diagnosis are real-valued, so Eq. ( 2) is selected as the energy function in this paper.The probability that the network assigns to every possible pair of a visible and a hidden vector is given via this energy function as the following: where is called as "partition function" and defined as the sum of ( , | ) over all possible configurations.
The network assigns probability to a visible vector, , is given by summing over all possible hidden vectors: By adjusting = { , , } to lower the energy of a training sample and to raise the energy of other samples, the probability that the network assigns to the training sample can be raised, especially those which have low energies and then make a big contribution to the partition function.
A standard approach to estimate the parameters of a statistical model is maximum-likelihood estimation, which maximizes the likelihood by using the training data to train the parameters = { , , }.The likelihood is defined as: where represents the set of samples and is the size of .Maximizing the likelihood is the same as maximizing the log-likelihood given by: Gradient descent method is usually employed to find the maximum likelihood parameters analytically.The derivative of the log probability of a training data with respect to is given by: Because there are no direct connections between the hidden units in an RBM, it is very easy to calculate the first item of Eq. (7).Given a randomly selected training data (real-valued), , the binary state of each hidden unit, ℎ , is set to 1 with probability: where ( ) is a sigmoid function.Similarly, given a hidden vector , is set to 1 with probability: where (⋅) expresses normal distribution function.However, it is much more difficult to get the second item.It can be done by starting at any random state of the visible units and performing alternating Gibbs sampling for a very long time.An iteration of alternating Gibbs sampling consists of updating all of the hidden units in parallel using Eq. ( 8) followed by updating all of the visible units in parallel using Eq. ( 9).
The algorithm performs Gibbs sampling and is used inside a gradient descent procedure to compute weight, which is updated as the following [29]: (1) Take a training sample , compute the probabilities of the hidden units and sample a hidden activation vector from this probability distribution.
(2) Compute the outer product of and and call this the positive gradient.
(3) From , sample a reconstruction of the visible units, then resample the hidden activations from this.(Gibbs sampling step).
(4) Compute the outer product of and and call this the negative gradient.
The update rule for the biases and is defined analogously.

Deep Boltzmann machine
A deep Boltzmann machine (DBM) [28] is undirected graphical models with bipartite connections between adjacent layers of hidden units, which is a network of symmetrically coupled stochastic units.Similar to RBMs, this binary-binary DBM can be easily extended to modeling dense real-valued count data.For real-valued cases, Cho et al. [30] proposed a Gaussian-Bernoulli deep Boltzmann machine (GDBM) which used the Gaussian neurons in the visible layer of the DBM.Fig. 2(b) presents a three-hidden-layer DBM, whose energy is defined as Eq.(10), where is the number of hidden layers: Salakhutdinov et al. [28] introduced a greedy and layer-by-layer pretraining algorithm by learning a stack of modified RBMs for DBM model, where contrastive divergence learning [25] works well and the modified RBM is good at reconstructing its training data.In this modified RBM with tied parameters, the conditional distributions over the hidden and visible states are defined as Eq. ( 11) and Eq. ( 12): where ( ) is a sigmoid function.When a stack of more than two RBMs is greedily being trained, the modification only needs to be used for the first and the last RBM in the stack.For all the intermediate RBMs, simply halve their weights in both directions when composing them to form a deep Boltzmann machine.It should be noted that there are two special cases: the last and the first hidden layers for the above equation.For the last hidden layer (i.e., = ), we set = 0.As for the first hidden layer (i.e., = 1), parameters for Eq. ( 12) should be set as:

Deep belief networks
Deep belief networks (DBNs) [19] can be viewed as another greedy, layer-by-layer unsupervised learning algorithm that consists of learning a stack of RBMs one layer at a time.The top two layers form a restricted Boltzmann machine which is an undirected graphical model, but the lower layers form a directed generative model (see Fig. 2(a)).The training algorithm for DBNs proceeds as follows.Let be a matrix of inputs, and regarded as a set of feature vectors.
(1) Train a restricted Boltzmann machine on to obtain its weight matrix, , and use this as the weight matrix between the lower two layers of the network.
(2) Transform by the RBM to produce new data ′.
(3) Repeat this procedure with ← ′ for the next pair of layers, until the top two layers of the network are reached.
(4) Fine-tune all the parameters of this deep architecture with respect to the supervised criterion.

Stacked Auto-encoders
The Auto-encoder is trained to encode the input into some representation ( ) so that the input can be reconstructed from that representation [29].Hence the target output of the auto-encoder is the auto-encoder input itself.If there is one linear hidden layer and the mean squared error criterion is used to train the network, then the hidden units learn to project the input in the span of the first principal components of the data.Auto-encoders have been used as building blocks to build and initialize a deep multi-layer neural network [15,30,31].The training procedure is similar to the one for Deep Belief Networks.The principle is exactly the same as the one previously proposed for training DBNs, but auto-encoders instead of RBMs are used as the following [20]: (1) Train the first layer as an auto-encoder to minimize some forms of reconstruction errors of the raw input.
(2) The outputs of hidden units on the auto-encoder are used as input for another layer, which is also trained to be an auto-encoder.
(3) Iterate as step (2) to initialize the desired number of additional layers.
(4) Take the last hidden layer output as input to a supervised layer and initialize its parameters (either randomly or by supervised training, keeping the rest of the network fixed).
(5) Fine-tune all the parameters of this deep architecture with respect to the supervised criterion.Alternately, unfold all the auto-encoders into a very deep auto-encoder and fine-tune the global reconstruction error, as in [32].

Feature representations of vibration signals
In this section, the feature extraction of vibration signal will be introduced.The gearbox condition can be reflected through the information included in different time, frequency and time-frequency domain.The features in frequency and time domain are extracted from the set of signals obtained from the measurements of the vibrations at different speeds and loads, which are used as input parameters for the deep neural network.

Frequency-domain feature extraction
For a vibration signal of the gearbox, ( ), its spectral representation ( ) can be calculated by Eq. ( 14): where the "^" stands for the Fourier transform, is the time and is the frequency.The time domain signal was multiplied by a Hanning window to obtain the FFT spectrum.The spectrum can be divided into multiple bands, and the root mean square value (RMS) for each band keeps track of the energy in the spectrum peaks.RMS value is evaluated with Eq. (15), where is the number of samples of each frequency band: Fig. 3 and Fig. 4 present the FFT spectrum and its RMS representation of a vibration signal, respectively.It is obvious that the root mean square (RMS) values keep track of the energy in the spectrum peaks.To reduce the number of input data, the spectrum was split in multiple bands and the RMS value of each band is used as feature representation in the spectrum domain.

Time-domain feature extraction
The time-domain signal collected from a gearbox usually changes when damage occurs in a gear or bearing.Both its amplitude and distribution may be different from those of the time-domain signal of a normal gear or bearing.Root mean square value reflects the vibration amplitude and energy in time domain.Standard deviation, skewness and kurtosis may be used to represent time series distribution of the signal in time domain.(1) Mean value: (2) Standard deviation: (3) Skewness: (4) Kurtosis: To sum up, the vector of the features of the preprocessed signal is formed as follows: RMS values, standard deviation, skewness, kurtosis, rotation frequency and applied load measurements, which are used as input parameters for the deep neural networks.In this paper, is set to 251.

DNN-based classifier
In this section, the implementation of the classifier based on DNN will be introduced.In the pre-training stage, RBM, DBM, DBN or SAE (see their detail implementation and parameters settings in [18,19,26,27]) are employed as pre-training strategies of DNN for gearbox fault diagnosis respectively.At the second stage, the parameters of the whole network is fine-tuned by using supervised training.The training procedure is shown in Fig. 6, which presents the pseudo-code of DNN-based classifier followed in the processing of the signal.A batch training strategy is used to train the neural network, where the weights of nets are shared by a batch of training samples with mini batches of size.

Experimental setup
To validate the effectiveness of the proposed method for fault diagnosis, we constructed three kinds of vibration signal data sets based on the health condition of two rotating mechanical systems.The experimental set-ups and the procedures are detailed in the following subsections.

Data set I
The data set I of vibration signal includes different basic fault patterns as defined in Table 1 for the gearbox diagnosis experiments.11 patterns with 3 different load conditions (300, 600, and 900 rpm) and 3 different input speeds (zero, small, and great) were applied during the experiments.For each pattern, load and speed condition, we repeated the tests for 5 times.Each time, the vibration signals were collected with 24 durations, each duration covered 0.4096 sec.The sampling frequency for the vibration signals was set for 50 kHz and 10 kHz, respectively.

Data set II
In data set I, each vibration signal only includes information of one fault component, which has only a kind of fault.However, there are usually two or more fault components in the real-world rotating mechanical system.In order to evaluate whether the proposed approach is applicable in fault diagnosis of industrial reciprocating machinery, data set II is constructed, where each fault pattern includes two or more basic faults.Firstly, some basic faults are defined in Table 2 and Table 3, which include 11 kinds of basic gear faults and 8 kinds of bear faults, respectively.12 combined fault patterns are defined in Table 4.  , bearing , , , and as labeled in Fig. 8(a).The conditions of the test are described in the Table 5, where 4 different load conditions and 5 different input speeds were applied for each fault pattern during the experiments.For each pattern with different load and speed condition, we repeated tests for 5 times.Each time, the vibration signals were collected with 10 durations, each duration covered 0.4096 sec.

Data set III
One or two test cases cannot fully reflect the reliability and robustness of an algorithm.Although some classifiers are effective for some special data sets, they get easily stuck in "apparent local minima or plateaus" in some other cases, resulting in a disability to classify fault patterns effectively.To further validate the reliability and robustness of the DNN, a fault condition pattern library has been constructed, which has 55 kinds of condition patterns based on the fundamental patterns described in Table 2 and Table 3.Each condition pattern holds more than one basic gearbox fault.
To challenge the proposed approaches, we have generated a large number of data sets.Each data set includes kinds of condition patterns.Here three kinds of 's value are considered for these data sets: 12, 20 and 30, respectively.It is obvious that bigger value of means the classification and identification of faults are more difficult.For each size of , 20 different data sets were generated, where each one involves unique combination of condition patterns that are randomly selected from the above mentioned pattern library.
Here each data set is collected from the measurements of a vertically accelerometer on the gearbox fault diagnosis experimental platform shown in Fig. 8 , …,  ]).Here expresses th data set, and [ , ,…, ] is a combination randomly selected from the pattern library.60 different data sets are generated in total to further evaluate the performance of the proposed approaches.

Experiment and discussion
In this section, we will evaluate the performance of DBN, DBM, RBM and SAE based on data sets defined in Section 4. Based on feature extracting method, feature representations of each vibration signal are formulated as a vector with 256 dimensions, which includes 251 RMS values, standard deviation, skewness, kurtosis, rotation frequency and applied load measurements.These features are regarded as the input of neural network.

Parameters tuning
As mentioned above, the training of DNN includes two stages: pre-training and fine-tuning.At the stage of fine-tuning, the DNN is usually treated as a feed-forward neural network (FFNN) by using supervised training.FFNN is also typically used in supervised learning to make a prediction or classification.To evaluate the performance of DNN, a comparison study between FFNN with DNN is presented for gearbox fault diagnosis.The net parameters are set as

Number of layers
The number of layers (nn.n) decides the depth of net architecture.Experimental evidence suggests that training deep architectures is more difficult than training shallow ones.To confirm the optimal number of layers of DNN for gearbox fault diagnosis, we firstly discuss the effect of different nn.n based on data set I and data set II.
Five schemes of FFNN described in Table 8 are considered to investigate the effect of different settings.Table 9 presents the parameter tuning of nn.n where nn.unit = 30, and nn.epoch2 = 100.As shown in Table 9, the experimental results suggest that when the architecture gets deeper for each scheme, it becomes more difficult to obtain good results.When nn.n is set to 6 and 8 for FFNN, only FFNN Scheme2 and FFNN Scheme4 can achieve good classification accuracy, and all others obviously deteriorate.
To investigate the effect of nn.n for DBN, DBM, RBM and SAE, the epoch of pre-training (nn.epoch1) is set to 1 and FFNN Scheme4 are selected as training scheme in the fine-training stage.
As shown in Table 9, when nn.n is set to 6 and 8, DBN, DBM and RBM are obviously deteriorated, and only SAE still achieves good classification accuracy.
From the experiment results presented in Table 9, we draw the following conclusion: for the DBN, DBM, RBM and FFNN, if its architecture gets deeper, it will become more difficult to obtain good classification accuracy for gearbox fault diagnosis; when nn.n is 3 or 4, it has the best performance for DNN and FFNN, which means there is one or two hidden layers for net architecture.So alternatively we set nn.n to 3 for all the following experiments.

Number of the neuron of the hidden layer
The number of the neuron of the hidden layer (nn.unit) is another important parameter of net architecture.The experiment results using different size of nn.unit for five FFNN-based classifiers and four DNN-based classifiers are presented in Table 10.We can draw the conclusion that it is not sensitive to vary the size of nn.unit for data set I and data set II.So, the number of neuron hidden layer is set to 30 for all the following experiments.As shown in Table 11, when nn.epoch 1 is equal to 1, good classification accuracy can be obtained.
If the pre-training epochs get longer, better results cannot be obtained.Fig. 9 and Fig. 10 present the convergence process of error rate for data set I and data set II, respectively.For data set I, after only 20 epochs of fine-tuning, the error becomes very small.Fig. 9 shows that the error rate is lower than 0.1 after 50 epochs of fine-tuning for data set II.Compared with the FFNN starting from random initialization (FFNN Scheme1~5 ), Fig. 9   One or two test cases cannot reflect the reliability and robustness of an algorithm.To further evaluate the performance of DNN, we constructed data set III. Firstly, we consider these data sets (#1-#20), where each one has 12 kinds of different condition patterns (CP = 12, where CP expresses the number of condition patterns included in a data set).Table 13 indicates experiment results by using 8 different classifiers for them.As shown in Table 13, the least classification accuracy among the four DNNs is 92.8 % of SAE for the 15th data set; each of the mean classification accuracy is larger than 98.0 %.
To further challenge the proposed classifiers, we add fault condition pattern included in a data set.Table 14 and 15 present the experiment results of 20 data sets, respectively.Each data set has 20 and 30 kinds of condition patterns respectively (CP = 20 or 30).More condition patterns mean that it is more difficult to obtain good results.As shown in Table 14 and 15, DBN, DBM, RBM and SAE still have good performance for these cases.
Among test cases of 21st-40th data set, DBN, DBM, RBM and SAE have larger than 90 % of mean classification accuracy; the least one is 77.6 % of DBN for the 35th data set.Among test cases of 41st-60th data set, DBN, DBM, RBM and SAE have larger than 84% of mean classification accuracy; the least one is 54.6 % of DBN for the 48th data set.

Comparison and analysis
To verify Y. Bengio's opinion [17,18] that the gradient-based training of supervised multi-layer neural networks (starting from random initialization) gets easily stuck in "apparent local minima or plateaus", three multi-layer neural networks (FFNN Scheme1 , FFNN Scheme2 , FFNN Scheme4 ) are used to classify the same data set for gearbox fault diagnosis.Their classification results are also indicated in Table 13, 14 and 15, respectively.In addition, SVM is employed to compare with the proposed approaches.The algorithm SVM is applied by using the LibSVM [33].
The parameters for SVM are chosen as = 1 and core (kernel) given by a radial basis function where = 0.5.These parameters were found through a cross search, aiming at the best model for the SVM.
As shown in Table 13, among 20 test cases CP = 12, three FFNN-based classifiers (FFNN Scheme1 , FFNN Scheme2 , and FFNN Scheme4 ) have 5 test cases with bad classification accuracy, even smaller than 70 % for them (#1, #4，#7, #13 and #20), although it is effective for other 14 test cases whose classification accuracies are larger than 90 %.As for the comparison between SAE, RBM, DBM and DBN, Fig. 11 indicates the mean classification accuracy data set with different kinds of condition patterns (CP = 12, 20 and 30) for SAE, RBM, DBM and DBN, respectively.As shown in Fig. 11, four deep neural networks have almost equal classification accuracy for the data set with CP = 12; in the case of CP = 20 and 30, RBM and DBM are slightly better than SAE and DBN.However, the classification accuracy of are extensively evaluated for vibration-based gearbox fault diagnosis.Some interesting findings from this study are given below: 1) Multi-layer feed-forward neural network with one or two hidden layers performs better than deeper net architectures for gearbox fault diagnosis, and they are prone to be stuck in "apparent local minima or plateaus" in the test cases.As a result, they don't show good robustness for gearbox faults diagnosis.
2) The testing results demonstrate that the deep learning algorithms, RBM, DBM, DBN and SAE, are efficient, reliable and robust in gearbox fault diagnosis.These classifiers have a good potential to provide helpful maintenance guidelines for industrial systems.With these methods, different types of component faults at different severity levels (e.g., initial stage or advanced stage) could be well classified.Furthermore, it is also shown that vibration signals usually carry rich information in fault detection, control and maintenance planning of rotating machines.

Fig. 5 .
Fig. 5. Gearbox fault diagnosis based on deep neural networks Four time-domain features, namely, standard deviation, mean value, skewness and kurtosis are calculated.They are defined as follows.(1)Mean value: Fig. 5 presents the outline of DNN-based gearbox fault diagnosis.

Fig. 6 .
Fig. 6.Pseudo-code of DNN-based classifier Fig. 8 indicates the internal configuration of the 2476.VIBRATION-BASED GEARBOX FAULT DIAGNOSIS USING DEEP NEURAL NETWORKS.ZHIQIANG CHEN, XUDONG CHEN, CHUAN LI, RENÉ-VINICIO SANCHEZ, HUAFENG QIN gearbox and positions for accelerometers, which is a two-stage transmission of the gearbox with 3 shafts and 4 gears.The parameters of all components on this gearbox are as follows: Input gear: = 27, modulus = 2, and Φ of pressure = 20; Two intermediate gears: = = 53; and the output gear: Z 4 = 80.The faulty components used in the experiments include gears , , , and Based on the above experimental platform for gearbox fault diagnosis, data set II has 12000 vibration signals (i.e., [ ( ) ( ), …, ( ) ( )]) corresponding to 12 combined condition patterns (i.e., [ , , …, ]) to be recorded.a) b) Fig. 8. a) Internal configuration of the gearbox; b) Positions for accelerometers , whose test conditions and generating method are the same as that of data set II.Each data set has 12000 vibration signals (i.e., [ ( ) ( ) ,…, ( ) ( ) ]) corresponding to each combination of condition patterns (i.e., [ ,

6. 1 . 3 .
Epochs of training The epochs of training also influence the performance of FFNN-based and DNN-based classifier.If the epoch of training is too long, it will be possible to lead to "overfitting"; or even worse, it will possibly result in a lack of training.nn.epoch 1 and nn.epoch 2 represent the epochs of training in the pre-training and fine-tuning stage of DNN, respectively.Table 11 presents the experiment results of varying pre-training epochs (nn.epoch 1 = 1 to 10), where nn.epoch 2 = 100.
and Fig. 10 also show DNN (DBN, DBM, RBM and SAE) obviously reduce "overfitting" phenomenon for gearbox fault diagnosis.nn.epoch 1 and nn.epoch 2 are set to 1 and 100 for the following experiment evaluations, respectively.

Fig. 9 .Fig. 10 .
Fig. 9.The error rate on data set I for different classifiers

Zhiqiang
Chen received a B.S. degree from WuHan University of Water-Conservancy and Electric Power, WuHan, China and an M.S. degree from ChongQing University, Chongqing, China in 2001 and 2004 respectively.He received Ph.D degree from Fukui University, Japan in 2011.He is currently an Associate Professor with the School of Computer Science and Information Engineering, Chongqing Technology and Business University.His main research interests are computational intelligence and signal processing.Xudong Chen received his B.S. degree and M.S. degree from ChongQing University, ChongQing, China in 1999 and 2002 respectively.He received his Ph.D. degree from University of Electronic Science and Technology of China, Chengdu, China in 2007.His main research interests are real-time computing, image processing, and optimization problems.Chuan Li received his Ph.D. degree from the Chongqing University, China, in 2007.He has been successively a postdoctoral fellow with the University of Ottawa, Canada, and a senior research associate with the City University of Hong Kong, China.He is currently a Professor with the Chongqing Technology and Business University, China, and a Prometeo Researcher with the Universidad Politécnica Salesiana, Ecuador.His research interests include machinery healthy maintenance, and intelligent systems.René-Vinicio Sanchez received his Master in management audit quality in 2008 at the UTPL, Ecuador, and the Master degree in industrial technologies research in 2012 at the UNED, Spain.Currently, he is Professor of the Department of Mechanical Engineering in the UPS.His research interests are in machinery health maintenance, pneumatic and hydraulic systems, artificial intelligence and engineering education.Huafeng Qin received B.S. in School of Mathematics and Physics and M.Eng. in College of Electronic and Automation from Chongqing University of Technology and a Ph.D. degree in College of Opto-Electronic Engineering from Chongqing University.He is currently a postdoctoral researcher with the Department Electronics and Physics at Telecom-SudParis, France.His research interests include pattern recognition and machine learning.

Table 1 .
Condition patterns of the gearbox configuration

Table 2 .
Nomenclature of gears faultData set II was obtained from the measurements of a vertically accelerometer on another gearbox fault diagnosis experimental platform.

Table 4 .
Condition patterns of the experiment

Table 5 .
The conditions of the test

Table 6 -
7. Based on different training parameters, five typical FFNNs are defined in Table 8.Four parameters (nn.n, nn.unit, nn.epoch1 and nn.epoch2) are fine-tuned based on data set I and data set II as follows.

Table 7 .
Setting of training parameters at the pre-training stage

Table 12 .
Classification accuracy of Data Set I and II

Table 13 .
Classification accuracy of data set with 12 kinds of condition patterns (CP = 12) Table 14 indicates that FFNN Scheme1 , FFNN Scheme2 , and FFNN Scheme4 have 4 test cases with bad classification accuracy (#29, #33, #34 and #35).Table 15 indicates that FFNN Scheme1 , FFNN Scheme2 , and FFNN Scheme4 have 5 test cases (#48, #53, #54, #55 and #57) is bad.This also verifies the negative observations that gradient-based training of multi-layer neural networks (starting from random initialization) gets easily stuck in "apparent local minima or plateaus" in some cases.They don't have good robustness for gearbox faults diagnosis.Corresponding to the four DNN-based classifiers, they are able to obtain good classification accuracy for 62 data sets.So, we can draw the following conclusions that the DNN-based classifiers are able to avoid falling into "apparent local minima or plateaus" and are reliable and robust for gearbox fault diagnosis.Compared with FFNN-based classifiers and SVM, DBN, DBM, RBM and SAE have overwhelming superiority in the items of reliability and robustness for gearbox fault diagnosis.

Table 14 .
Classification accuracy of data set with 20 kinds of condition patterns (CP = 20)