Crack classification in rotor-bearing system by means of wavelet transform and deep learning methods: an experimental investigation

. Parallel with significant growth in industry, especially mysteries related to energy engineering, condition monitoring of rotating systems have been experiencing a noticeable increase. One of the prevalent faults in these systems is fatigue crack, so finding reliable procedures in identification of cracks in rotating shafts has become a pressing problem among engineers during recent decades. While a vast majority of cracked rotors can operate for a specific period of time, to prevent catastrophic failures, crack detection and measuring its characteristics (i.e. size and its location) seem to be essential. In the present essay, a hybrid procedure, consisting of Deep Learning and Discrete Wavelet transform (DWT), is applied in detection of a breathing transverse crack and its depth in a rotor-bearing-disk system. DWT with Daubechies 32(db32) as wavelet mother function is applied in signal noise reduction until level 6, also its Relative Wavelet Energy (RWE) and Wavelet entropy (WE) are extracted. A characteristic vector that is a combination of RWE and WE is considered as input to a multi-layer Artificial Neural Network (ANN). In this supervised learning classifier, a multi-layer Perceptron neural network is used; in addition, Rectified Linear Unit (ReLU) function is exerted as activation function in both hidden and output layers. By comparing the results, it can be seen that the applied procedure has strong capacity in identification of crack and its size in the rotor system.


Introduction
Because of a rapid growth in industry and technology, a vast majority of machines work in high-velocity, so potential faults in these devices can bring about many detriments. Rotating systems are one of the most widely used devices in modern and classic industries, for this reason analyzing these machines has become a favorite task among engineers. A common rotor system is combined of disk, bearing and a shaft that the latter accounts as the heart of this complex. Numerous faults can occur in this system, but bearing and shaft's faults are more rampant. Faults such as misalignment, cracks and rotor to stator rub can occur concertedly in rotor bearing systems. One of the prevalent faults in the rotor system is crack, especially fatigue crack that results from bending loads [1]. Since the 70's, many scholars have been working on crack identification methods in rotating systems. During the two last decades not only crack identification procedures witnessed noticeable developments, but also, they have concentrated focus on vibration analyzing [2].

Previous works
Throughout recent years, scholars have been using numerous methods to find out crack symptoms in rotating systems; however, the inverse problem of the identification of cracks has not been commonly included [3]. To have a classification of used methods, we can classify them in two main categories: local and global methods. In the local procedures, normally non-destructive methods such as X-ray, Ultrasonic, Liquid penetrant and Eddy current can be used. On the other hand, for studying the features of structural systems with cracks, vibrationbased crack detection methods are more powerful and superior than the other non-destructive crack identification methods. Because of their capacities in accurate and online crack detection [4].
Among the all vibration-based methods, signal processing techniques are more useful in online crack detection. Generally, different procedures such as finite element method, manners that consider breathing behavior of crack, time-frequency domain transforms such as HHT, continuous and discrete wavelet transforms, artificial intelligence such as Artificial Neural Network and things like these areas employed in modeling of faulted dynamic systems and processing of these systems vibration. In addition, some characteristics can be influenced by crack, for example in some articles, some harmonic and subharmonic components are introduced as crack's present in rotating systems [5].
Sekhar compared numerous time-frequency methods and several faults features in a rotating shaft. In the mentioned article, by means of time-frequency domain manners, vibrations of the rotating system during its start-up were processed. To have a comparison between various types of faults in the rotor system such as rotor-to-stator rub, misalignment and crack, Sekhar employed a finite element method, then the equation motion of the system was solved by help of numerical procedures. Moreover, the effectiveness of three different time-frequency transforms, i.e., Hilbert-Huang, wavelet and short-time Fourier in crack identification were compared together. As a result, Sekhar claimed that HHT was less time-consuming. However, for noisy data CWT was more preferred over HHT [1]. A perfect literature of applying wavelet transforms in crack detection exists in Gómez's paper [6].
In reality, crack in dynamic systems like rotor-system can breathe, i.e. close and open cyclical, because of their weights. One of the intensive impacts of a crack's breath is it can change the system's flexibility (or stiffness), so in analyzing a cracked rotor this effect should be considered.
Until now, various methods have been applying to model breathing behavior: 1) Breathing mechanism model without weight considering. 2) Breathing mechanism model with weight considering. 3) Switching model (assuming crack is fully open or not is fully close). 4) Response-dependent breathing crack model. In cases, there are not any momentary stimulation, and system operate in stable circumstances (i.e. steady-state velocity), crack's breathing behavior can be modeled by sinusoidal stiffness variation or by stepwise stiffness fluctuation that refer to 1-3 above methods have used a response-dependent breathing model to take into account the gradual opening and closing of the crack using the stress intensity factor at the crack front at each instant and then have found the amount of crack opening and hence the stiffness [7].
Mobarak and Wu have studied the dependence of the breathing mechanism on the crack location. In this work, a rotor system containing two discs, suffering a flat crack and has threedegree freedom are considered using the finite element method. Actually, the numerical method in Abaqus software was applied to the FEM model. In this research, dependency of crack breathing treatment to crack position and properties of unbalance force were approved. Two crack breathing areas were distinguished along the total length of the rotor where the unbalanced and balanced shaft stiffness may be same or different, linked to the unbalanced orientation of the force, amount, and location of the crack. As well as, four specific crack locations were identified along the shaft, where the crack remained completely closed or open or only acts as a balanced shaft [8].
Artificial intelligence (AI)-based techniques have great potential for identifying crack location and crack depth accurately in a rotating shaft. These methods consist of several steps: signal generation, signal processing, pattern classification and crack diagnosis. First, non-stationary and non-linear vibration signals that are generated from a cracked rotor should be inputted. Next, we ought to apply a multiresolution analysis (MRA) and discrete wavelet transform (DWT) techniques to process the vibration signals and to extract characteristic patterns. Finally, we should apply the AI based techniques (ANN, GAs, Fuzzy Inference, Hybrid techniques etc.) for pattern classification and selection. After these procedures, crack's depth and its location can be identified. Gupta et al. applied an ANN to demonstrate faults such as cracks and imbalances in a rotorbearing system. Then, by the help confusion matrix, the class of crack and unbalance was decided [9]. An extensive literature review on the use of artificial intelligence for fault diagnosis of rotating machinery has been presented by Ruonan and his colleagues [10].
Zhao et al., introduced a procedure that has capacity in identification of crack and misalignment in a rotor system which combines variational mode decomposition (VMD) and probabilistic principal component analysis (PPCA) to reduction environmental noises the captured vibration signals from an experimental rig and then gain signal feature extraction and fault classification by using CNN [11].
In [12] two approaches were used for crack detection in rotating machinery, model-based and signal-based approaches, were compared. Strength and weak points were discussed and compared for the two approaches using two representative applicable methods, in order to achieve a comparative overview of these two available techniques. Söffker et al. employed Proportional-Integral-Observer approach (i.e. is a novel model-based procedure) in demonstrating model-based capacities and restrictions. As a result, they presented a modern signal-technique which is a combination of support vector machine and wavelet transform. An intensive review on almost all applied procedures in the field of crack detection carried out by Sabnavis [13].
Relative wavelet energy as a feature vector was applied for the first time in 2009 by Ling Gue to classification EEG signals. In this work, a feature vector consisting of relative wavelet energy components was applied in distinguishing normal EEG signal and epileptic EEG signal [14]. Wavelet entropy is a measure of the degree of order/disorder of the signal and it indicates the latent dynamical properties of the non-linear signals [15]. Moreover, in 2012, Kumar used wavelet entropy and relative wavelet energy as inputs to Artificial Neural Networks to classify normal and faulted EEG signals.
In present work, a hybrid procedure is applied in classification of cracked and intact shafts in the experimented rotating system. This hybrid method is based on supervised deep learning algorithms. At first, noise is removed from the signal by means of discrete wavelet transform (DWT). In the following, the signal is decomposed until level 6 with 'db8' as wavelet mother function. Relative wavelet energy (RWE) and wavelet entropy (WE) are employed in construction of feature vectors. The feature vector (i.e. has four members) is used as an input to ANN. At the next step, a multi-layer Perceptron algorithm is employed as supervised learning of binary classifiers. In addition, Rectified Linear Unit (ReLU) is applied in both hidden and output layers to avoid overfitting.

Discrete wavelet transform (DWT)
The continuous wavelet transform (CWT) of a signal, , is the integral of the signal multiplied by scaled and shifted versions of a wavelet mother function Ψ and can be defined by [16]: Here and are called the scaling and shifting parameters, consequently. Calculation of wavelet coefficients at every possible scale is very time-consuming. Instead, if the scales and shifts are selected based on powers of two, the so-called dyadic scales and positions, then the wavelet analysis will be much more efficient. This type of analysis can be achieved of discrete wavelet transform: Here 2 and 2 are alternatives for and consequently. The DWT of an indication is decomposed simultaneously employing a high-pass filter (HP) and a low-pass filter (LP) with impulse response. Output gives the detail coefficients (D) from the HP and also the approximation coefficients (A) from the low-pass one [17]. According to Nyquist principle, the resulted signal has half frequency bandwidth of the first signal and can be sampled. At each step of this decomposition process, the frequency resolution is doubled through filtering and the time resolution is halved through down sampling.

Relative wavelet energy (RWE)
Since the family set .
is an orthonormal basis for ( ), the concept of energy is linked with the usual notions derived from the Fourier theory [18]. First, the number wavelet Ψ( ) and the number of decomposition levels are selected. The energy at different decomposition levels (from 1 to ) is the energy of wavelet coefficients . and, in order to simply description, the energy of scaling coefficients is defined as the energy at decomposition level + 1. Thus, the energy at each decomposition level is defined as [19]: Then, the full energy of the signal after employing wavelet decomposition is achieved as: Therefore, the relative wavelet energy (RWE) is defined as: Clearly, ∑ = 1 and therefore the distribution are often considered as a time-scale density. Relative wavelet energy can show some crucial information concerning relative energy and associated frequency bands and might detect the degree of similarity between segments of a symbol. For this study we determine relative energies for each band before and after thresholding.

Wavelet entropy (WE)
The Shannon entropy gives a useful criterion for analyzing and comparing probability distribution, it provides a measure of the data of any distribution. The full WE can be defined as [15]: The WE appears as an amount of the degree of order/disorder of the signal, so it can prepare beneficial information about the underlying dynamical process related to the signal. In fact, a really ordered process may be thought of as a periodic mono-frequency signal (signal with a tiny band spectrum).

Deep learning
Deep learning may be a subset of machine learning which is itself a subset of AI and statistics. Briefly, Deep Learning could be a Machine Learning procedure that employs the deep neural network; the deep neural network is that the multi-layer neural network that contains two or more hidden layers [20]. Fig. 1 illustrates the concept of Deep Learning and its relationship to Machine Learning. The deep neural network lies within the place of the ultimate product of Machine Learning, and therefore the learning rule becomes the algorithm that generates the model (the deep neural network) from the training data. The initial neural networks had an issue where the deeper (more) hidden layers were harder to coach and degraded the performance. The poor performance of the deep neural network is because of the failure of proper training. In this process three various showstoppers can be assumed: the vanishing gradient, computational load and last but not least overfitting. The vanishing gradient problem is greatly improved by employing the Rectified Linear Unit (ReLU) activation function and also the cross entropy-driven learning rule. Using improved gradient descent method can promote some benefits. The ReLU function is defined as: To overcome the issue of overfitting in deep machine learning, dropout or regularization should be applied, but a huge amount of time is needed in terms of calculation. This is relieved to a large extent by the GPU and various algorithms [21]. Fig. 2 presents a schematic of multi-layer neural network, also the relationship between neurons (i.e. Deep learning). There are various programming algorithms for supervised machine learning that can be employed in binary classifiers; a binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific classes [22]. Here, specific classes mean healthy and cracked shafts with various depths. Perception is one the most well-known algorithms in this area, so in current work this algorithm is used.

Feature vector
To have a vision concerning a feature, a feature is defined as properties of a sample in forms of symbolic, numerical and even string arrays. If multiple elements about an object are put together, a feature vector can be made. Moreover, locating some feature vectors for different objects together can create a feature space. Feature vectors are employed broadly in machine learning due to its effectiveness and especially of representing objects in a numerical way to help with a vast type of analyses. Euclidean distance can be introduced as one of the simplest manners to compare the feature vectors of two different objects. Feature vectors are used in classification problems, artificial neural networks, and -nearest neighbor's algorithms in machine learning [23]. In present work, because we decomposed signal until level 6, our feature vector consists of four features that are defined as:

Test rig
In current work, a CTC Piezoelectric accelerometer is applied in which vertical vibration signals are measured; moreover, to collect vibration signals, Baumuller B MaXX 3000 analyzer is used. To analyze intact and faulted signals, at first, collected signals are transported to PC by means of VIBROEXPERT CM400 software, then, MATLAB® programming language is employed for coding and analyzing. For binary classifying, Pattern Recognition Toolbox in MATLAB® is engaged.
In Fig. 3, the test rig is shown. This experimental rig consists of a DC motor, a shaft, two journal bearings, two disks (are used in adding unbalancing force to the system) and as mentioned a Piezoelectric accelerometer, analyzer, and Personal computer. To create a crack close to real functional circumstances there are several procedures from Coping saw to propagated crack resulting from a notch with 3-point bending fixture; however, in this work to simulate breathing behavior of crack, wire cut is employed. Due to wire cut's limitation, i.e. there are drawbacks to using a wire with too small a diameter [24], in three shafts grooves with thickness 0.3 millimeter and depths (i.e. three classes related to cracked shaft) near to 20 %, 30 % and 40 % of shaft diameter are made; in addition, for further information concerning EDM wire cut procedure [25] is a reliable reference. Fig. 4 displays three cracked shafts.
Class 1= Healthy shaft. Class 2= Cracked shaft with relative depth equal to 20%d. Class 3= Cracked shaft with relative depth equal to 30%d. Class 4= Cracked shaft with relative depth equal to 40%d.

Fig. 4. Cracked shafts in various depths
Mechanical characteristics of shaft and disks of experimented rotor-bearing-disk system are given in Table 1 and Table 2 respectively.

Results
Transient Signals of the rotating system are captured during its start-up (i.e. first seven seconds); moreover, initial acceleration is 30. 60 rad/s 2 and sampling frequency is equal to 2 kHz (time interval= 0.005 second). Analyzed signals by means of CM 400 software are transported to PC. In Fig. 5, intact and cracked shaft signals are presented in various depths in the time domain. Also, this set of graphs show noise reduced signals that are obtained from DWT noise reduction procedure to level 6 with db32. Vibration signals after noise reduction are compatible with theoretical signals that were introduced in [26].
From the graphs it can be seen that near to the second 1 the amplitude of vibration signal has a rapid growth due to crack, and this jump increases by increasing crack depth. This change is denoted in the last graph by a red circle. In current work, concept of relative wavelet energy and wavelet entropy are used in forming feature vector to classifying shafts. Fig. 6 demonstrates wavelet coefficients (i.e. detail and approximation) of cracked, in class 4, and intact rotors, belonging to class 1, until level 6 by means of db32 as wavelet mother function.
As stated in previous section, in current investigation, a feature vector combined of RWT and WE is applied. In Table 3, the average amount of feature coefficients of shafts belonging to various classes are demonstrated.  Consequently, Fig. 7 Shows the amount of X1 and X2 increased gradually; however, X3 and X4 experienced slight decline parallel with increasing in crack depth. It is obvious that feature coefficients are selected properly, because these coefficients can form a difference between the various kinds of classes. To create a feature space, system is operated in two different initial acceleration (i.e. 30 and 60 rad/s 2 ), also 50 different locations for unbalancing masses are located in two disks, so by changing initial acceleration and eccentricity 100 samples are generated for each classes. In Fig. 8 the features x3 and x4 are compared for class 1 and class 4 as example. This graph is drawn for some samples. However, in some points, features have overlap for two classes, in almost all spots these two classes have different amounts that can be used for effective classifying.
Among 100 various data for the four classes, 28 samples are employed for training, 7 and 65 samples are applied for validation and testing respectively. For training, validation and testing data the mean squared error is 1.4003e-22, 1.05886e-22 and 0.0037 consequently. ANN algorithm could classify different classes with an accuracy equal to 99.62 %. In Fig. 9 confusion matrix for training data that is resulted from ANN (here is deep learning) operation is shown. It can be seen that ANN among all samples just has one mistake that is related to class 2 (i.e. cracked shaft with relative depth equal to 20 % of shaft diameter).

Conclusions
In this research, a hybrid procedure consisting of discrete wavelet transform and deep learning procedures are employed in classifying cracked shafts in a rotating system with various crack depths. At the initial step of signal processing, collected signals are noise reduced by the help of discrete wavelet method to level 6. In the following, relative wavelet energy and wavelet entropy of vibration signals are calculated. Feature vectors are extracted based on RWE and WE. Then, these features are used in classifying different classes of shaft (i.e. healthy and cracked rotors in three depths). To classify, a multi-hidden layer Perceptron algorithm with Rectified Linear Unit (ReLU) function is exerted as activation function is introduced. By using ReLU, the Perceptron algorithm avoided overfitting, so the results that are shown in Fig. 8 state that this hybrid method has accuracy above 99.5 percent. This threshold of accuracy approves the fact that the introduced manner in classifying cracked rotors in consideration with crack size has reasonable success. Nima Rezazadeh received his Master degree in mechanical engineering-applied design from Semnan branch Islamic Azad University, Semnan, Iran. M.S. Nima Rezazadeh has published several articles in International Conferences that were held in Iran, also he has published two papers in International Journals. His research interest consisted of signal processing and machine learning especially in the rotor system. Nima Rezazdeh worked as a Lecturer in some universities and applied science centers. He works as a head of quality control in Fouladin Zob Amol (FZA).
Shila Fallahy graduated as an architect from Tabari University, Babol, Iran. She worked as an engineer in some building projects where she learned fundamentals of optimization algorithms in optimization of buildings energy consumptions. At the moment, she is a student in Politecnico di Milano, Italy in building engineering. In current work, she helped in writing deep learning algorithm code.