Review on engine vibration fault analysis based on data mining

Through equipment monitoring, the uptimes of machines are enhanced in the industrial applications. The unpredicted failures risks are minimized by the proper equipment monitoring. The machine vibrations are increased caused by the failure modes. The vibration data requires effective analysis by the accurate assessment of the machine equipment. For fault feature selection and detection of faults in rotating equipment, the empirical knowledge is required. Low efficiency of the methods and motor speed control are the main drawbacks of the existing techniques. So the basic aim of this paper is the detection of rotating equipment faults by utilizing the vibration analysis. The motor vibration is analyzed and monitored using spectrum analysis. The spectral content are extracted and fed into the classifier like k-Nearest neighbors (KNN), back-propagation neural network BPNN, Sparse Representation Classifier (SRC), Support vector machine (SVM) and Random Forest (RF) for the type of failure prediction and analyze the unbalance condition (UNB), bearing faults (BDF), and broken rotor bars (BRB) faults. The RF classifier is better as compared to other classifiers in terms of accuracy, precision and recalls values by approximately 10.92 %, 11.03 % and 20.13 % respectively.


Introduction
In modern industry, rotating machinery is the most used machine like compressors, industrial fans, and aircraft engines [1]. In rotating machinery, faults may develop due to the high service load. The whole system is shutdown if the fault is not diagnosed in a timely way. Therefore, faults should be detected as early as possible to ensure the safe machinery operation safely. The rotating part of machine is highly prone to defects and commonly in non-linear and non-stationary rotating machines [2,3]. The defect in the machine is due to the surface roughness, dents, pits, etc., and it may be imminent.
Typical rotating machinery systems such as water turbine are critical equipment support of the national economy. The performance of rotating machinery is the main concern so there is great significance of rotating machinery fault diagnosis [4]. The sensor readings relationships are nondeterministic, e.g., for accurate condition monitoring, a single sensor is incapable of collecting enough data, so there is need of multiple data. The machinery fault simulator (MFS) is shown in Fig. 1. Most of the faults can simulate that commonly occur in rotating machinery like gearbox defects.
To detect the health of the machine, transformation, segregation and classification algorithms are utilized. The transforms are used for extraction of vibration signal features which are obtained from the rotating machine [5]. It is concluded by the authors after analytic comparison that entire data set and predict the health of the machine, only Decision Trees are fitted [6,7]. Due to prevailing variance, decision tree algorithm suffers from the malice of over-fitting of data.
The electrical equipment diversity like motors, electric vehicles, and energy transmission etc. are increasing from the past few years. The reason behind this exponential growth is the need for JOURNAL OF VIBROENGINEERING. SEPTEMBER 2021, VOLUME 23, ISSUE 6 the various activities performance like cell phone battery charging or starting the car etc. [8,9].
The components failure can generate: 1) High economic losses. 2) Degradation on its performance.
3) Outages in the production process.
a) The front view b) The side view The condition monitoring strategy consists different steps. 1) Collection of data through different sensors. 2) Feature extraction and data processing.
3) Data analysis for condition valuation. The origin of vibration is from different sources like fluid flow and rotating elements etc. The structural properties and boundary conditions may change with time and the large statistical deviations are showed by the vibration response [10][11][12].
The pattern extraction process from the data is the data mining as given in Fig. 2. The hidden patterns are generally searched that may exist in large databases. The patterns and correlations between patterns are discovered by the data mining scans through large datasets. The statistical models and machine learning methods are the different tools [13,14]. The data analysis and prediction is included in the data mining along with the collection and managing the data. The data is represented in quantitative, textual, or multimedia forms that can be performed on data mining. The data is examined by the variety of parameters used by the data mining applications [15]. Defining the problem, data preparation and exploration, building models, model exploration and validation are the basic steps that can be defined. The business problem, business requirements, defining the problem scope, evaluation of the model is included in the first step. The second step includes the consolidation and cleaning of the data. The mean and standard deviations calculation are included in the third step. The prepared data is randomly separated into separate training and testing datasets before building the model [16]. The data mining process for model exploration is included in fifth step to test the effectiveness of model.
For all the ML methods, data is the foundation and the collection is required for effective ML algorithms. The data collected by most of the researchers and conduct the experiment by utilizing induced faults, or with accelerated life testing techniques in recent research papers. The 15 number of audios, 6 number of bearing faults motor audio, 5 numbers of unbalanced condition motor audio and 4 numbers of broken rotor audio are collected as an input [17]. Case Western Reserve University (CWRU) dataset obliges as a fundamental dataset for the validation of various algorithm validation purposes. The other dataset Paderborn University Dataset is utilized which includes the motor vibration signals measurement. High sampling rate is utilized for measuring the vibration signals.
The machine vibrations are increased by all the failure modes and is the most widely technique for equipment condition determination. An AC motor drive is utilized to control the operational speed of the motor and set up the motor condition monitoring experiment. The motor's vibration is measured and monitored which is then analyzed the measured vibration data by using the spectrum analysis [18]. The comparison of vibration severity level is done with the standard severity table after overall monitoring the vibration level. It is then utilized for the condition of motor vibration determination. The kind of fault or failure mode in the motor vibration is identified by the specific natural frequency. Low efficiency of the methods and motor speed control are the main drawbacks of the existing methods.
The rest of the paper is organized as follows. Section 2 provides an overview of the exhaustive literature survey followed by the different methods detail for engine vibration in Section 3. The Performance analysis and parameters are detailed in the Section 4. The comparative analysis of different methods is detailed in Section 5. Finally, concluding remarks are provided in Section 6.

Related work
To the operational reliability, the basis for condition-based maintenance is provided by the diagnosis of machine fault and the remaining service life prognosis [19]. By examining the change in frequency components, effective analysis of vibration data is required for the accurate assessment of machine health. The empirical knowledge for fault is the limitation of these methods from which "visual word" features are extracted for fault related patterns recognition which are then classified by fed into sparse representation-based classifier. The effectiveness of the developed method is evaluated by the experimental bearing data with 99.7 % accuracy. The work on bearing fault diagnostics with deep learning (DL) algorithms is summarized by the author [20]. The conventional ML methods are briefly reviewed for bearing fault applications into the existing DL algorithms. The fault feature extraction and classification performances are analyzed by the DL based methods. By utilizing the open source Case Western Reserve University (CWRU) bearing dataset, different algorithms classification accuracy are done by the comparative study. The various algorithms are applied to the fault diagnostics and for specific application conditions, the detailed recommendations and suggestions are provided. In industrial environments, the severe damage of infrastructure is prevented by the diagnosis (FDD) with rotating machinery (RM) and condition monitoring (CM) [21]. As the induction motor (IM) is cheap and robust, it has been used in various industries. There is a serious breakdown of IM by the failure of this basic component. In the every field of the digital technology, there is great significance of Artificial intelligence (AI) techniques. In this paper, the authors presented the extensive literature review of CM and FDD of the IM. The merits and demerits of each method are highlighted in this study and then finally challenges are detailed. For the significant cost benefits to the industry, the useful and the reliable information is provided by the vibration method in the machine condition monitoring. There is possibility of detection of the defected journal bearings by comparing the machine running signals in both normal and faulty condition [22]. By using statistical and vibration parameters, 30 features were extracted which are then input to the classifiers. From the obtained results, it is demonstrated that the SVM performance is better as compared to the FLD and KNN. Authors in this paper details the data mining which has been gaining importance in many areas and attracting the all over the world's researchers [23]. It has many benefits such as large database exploitation and the extracted information usage through the analysis. The data mining techniques are reviewed and provided in this paper for faults in electric equipment diagnosis and detection. For the diagnosis of electric equipment, for the classification and egression, different methods are used from the year 2000 to the present. For vibration-based structural health monitoring, there is a rapid growth in the deep learning technology [24]. For system diagnosis, features are extracting from the vibration and the measures signals are correlated to the structure's current status. The large deviation is showed in the measured vibration and monitored the system's transient characteristics. The influence of surrounding environments is discarded by the vibration which requires the complete understanding of the extracted features. Due to flexibility and robustness, increasing application in these complex problems is found by the deep-learning-based algorithms. The machine learning algorithms are applied for fault monitoring and the review is provided in the summary. Authors describe the data mining methods for steam turbine fault diagnostics which is based on continuous data measurements [25]. The standardized vibration frequency data based classification rules are used for steam turbines and field experts' analyses of turbine vibration problems. The steam turbine fault diagnosis system is enabled by the expert knowledge which is more accurate and powerful. The twenty types of standard steam turbine faults are identified by the system. The 2000 simulated data sets are utilized by the system and 20 explicit rules are identified by the data mining methods for the turbine faults. The data mining can be effectively applied to diagnosis of rotating machinery as it is indicated by the results by giving useful rules for data interpretation. In the various industrial applications, rotating machinery is applied and is a promising field [26]. From industry, the early fault diagnosis (EFD) techniques have attracted increasing attention. To reduce financial losses and to prevent severe failures, appropriate information is provided for taking necessary maintenance actions. In this paper, author review and summarize the gears and bearings EFD. The fault frequency-based methods and artificial intelligence-based methods are the two aspects which are reviewed in the EFD applications. Author details the different classification algorithms for the vibration signal classification as a healthy or faulty method [27]. Identification is a task of fitting classification algorithm which is done at the problem statement identification. The Probabilistic Neural Network (PNN), decision tree (DT), and Radial Basis Network (RBN) are the machine learning techniques which are utilized as a classification algorithms. The Bootstrap Aggregation methods are the used to improve prediction and the decision tree algorithm which are assembled parallel [28].
Various methods for vibration fault analysis with their merits and demerits are shown in Table 1.

Fault detection method
Based on vibration signals analysis, the method is utilized for the detection of faults in steady state operation of motors. The flowchart is shown in Fig. 3. The decomposition of vibration signal is done in numerous intrinsic mode functions (IMF). Subsequently, the spectral content of the IMF is obtained in the frequency domain by calculating the frequency marginal of the Gabor representation. The type of failure occurred in the motor is predicted by SVM or RF. The number of samples is used to train the classifier before prediction. Advantages of the proposed method: 1) Maintenance cost is reduced.
2) Fault detection ability is increased.
3) Reduction of noise and interference. The audios are initially input into the filter which is utilized to filter the input audios. The 15 number of audios, 6 number of bearing faults motor audio, 5 numbers of unbalanced condition motor audio and 4 numbers of broken rotor audio are collected as an input. The frequency spectrum of the rotor system's vibration signal High accuracy Obtain better classification Fail to distinguish sample classes [38] First of all the spectral content is extracted from the vibration dataset and then classifier is utilized for classification of test data which is acquired by the vibration sensor. After data testing, test data is classified which results in prediction of machine faults.

Support vector machine (SVM)
The linear separating hyper plane is utilized for training sample division into the classes in the SVM classifier. To achieve this task, there are two suitable methods: first method is to find out the optimal decision hyper plane [39,40]. The second method is optimal decision hyper-plane which makes the boundary between the two parallel planes. The optimal hyperplane and the support points are produced by these two methods. Computing the SVM classifier amounts for minimizing an expression of the form as given in Eq. (1): Various real world problems are solved by the SVM. 1) It is helpful in categorization of text and hypertext 2) It can also perform the image classification with higher accuracy.
3) Satellite data classification is done using supervised SVM.

Random forest (RF)
For classification and regression, the RF is utilized based on the trees grouping. This method is Tree type based classifier and named as a component predictor. The RF is made from a large number tree and this RF tool is used classifiers in data mining techniques due to its accuracy [41,42]. This is the most suitable method for ensemble building by the random vectors which are generated via random selection procedure from integrated training set.
The unseen samples x' predictions can be made by the predictions from all the individual regression trees averages on as shown in Eq. (2): 1 . (2)

Sparse representation classifier (SRC)
The SRC is the pattern classification technique [43]. The setup is simple and the classification by this technique is highly accurate. Using truncation way, the SRC classification criterion is from the residuals. The demerit of SRC is if the th residual is the smallest then the SRC judges that the test sample belongs to the th class. There are variations to the sparse approximation problem. 1) Structured sparsity.

k-Nearest neighbors (KNN)
The simplest data mining classification method is KNN. Representing the texting sample in the same space and its nearest neighbors obtained [44,45]. The neighbors class is counted and the "votes'' class is classified of the testing samples. The Euclidean distance is computed by the nearest neighbors and each of the training samples [46,47]. The Euclidean distance between the -th training sample and the testing sample Tests is defined in Eq. (3): where = 1, 2,…, , = 1, 2,…, , where, = number of features, = training samples. The fault classification of rotating machinery is easily done by the KNN method.

Back propagation neural network (BPNN)
The complex decision boundaries in the feature space are generated by the BPNN classifier. A BPNN can approximate Bayesian posterior probabilities at its outputs. For a feature data, the best performance is provided by the Bayesian classifier [48]. The performance of a BPNN is not priori possible as with other non-parametric method to pattern classification [49,50]. The BPNN chooses several parameters including training samples, the hidden nodes, and the learning rate. Back-propagation generalizes the gradient in the delta rule, which is the single-layer version of back-propagation.

Performance analysis and parameters
Fault diagnose is methods are based on SVM which regard the problem of diagnosis as a classification one. Training of classifiers and fault prediction is done by utilizing the vibration dataset for run time faults.

Accuracy
The system's accuracy of actual class and predicted class outcome values is defined as shown in Eq. (4): where, TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative.

Precision
The fraction of accurate predicted results from the input is defined as the Precision as shown in the Eq. (5):

Recall
The positive cases percentage was identified accurately is known as recall. It is also called as true positive rate (TP). It is calculated by using the equation as given in Eq. (6):

Comparative performance analysis study
The different classifiers performance is demonstrated in this section. The results obtained by different classifiers such as SVM, RF, SRC, -Nearest neighbors (KNN) and BPNN compared for the effectiveness of the proposed technique. Tables 2 list the results of RF classifier under  variable parameter. The RF classifier performance is calculated and analyzed which is then compared with the other classifiers also. By examining Table 2, it is obtained that the best accuracy of 96.01 % is observed, which belonged to = 6 and = 8. As the computation time for variable 8 is less as compared to variable 6 so the best accuracy is obtained by variable 8. The graphical representation is also shown in Fig. 4 which illustrates the accuracy rate difference for the different values of . The redundant features are removed by proper feature selection method from the original feature set and it lead to decrease in accuracy rate by reducing the features.  Table 3. For better visualization, the different parametric values are also represented graphically in Figs. 5-7. The graphical representation of accuracy values obtained from different classifiers shows that the high accuracy is obtained by the RF classifier as shown in Fig. 5     The recall value is also analyze for different classifiers and visualized by presenting the values in the graphical form in Fig. 7  Better analysis of percentage improvement is done by visualizing the values graphically as shown in Fig. 8.
The accuracy improvement of RF classifier over KNN, BPNN, SRC and SVM are 10.92 %, 7.62 %, 0.13 % and 1.77 % respectively. In terms of precision values, the RF classifier is better than the KNN, BPNN, SRC and SVM by 11.03 %, 7.65 %, 7.12 % and 5.13 % respectively. When we see the improvement of RF classier in terms of recall values then the improvement of RF is shown over KNN, BPNN, SRC and SVM is 20.13 %, 14.26 %, 7.93 % and 2.66 % respectively.

Conclusions
The machine vibrations are increased by all the failure modes and is the most widely technique for equipment condition determination. An AC motor drive is utilized to control the operational speed of the motor and set up the motor condition monitoring experiment. The motor's vibration is measured and monitored which is then analyzed the measured vibration data by using the spectrum analysis. By analyzing acoustic sound signals, the detection of faults in the induction motors such as broken rotor bars, defects in bearings is demonstrated by the presented technique. The preprocessing of the signal is done by utilizing the preprocessing stage and then the spectrum estimation is improved. The spectral content are extracted and fed into the classifier like KNN, BPNN, SRC, SVM and RF for the type of failure prediction. The classifiers are trained with various numbers of samples before prediction. In this paper, UNB, BDF and BRB faults are analyzed. The classifiers utilized in this work are compared in terms of accuracy, precision and recall values and it is obtained that the best performance is obtained by the RF classifier. The accuracy improvement of RF classifier over KNN, BPNN, SRC and SVM are 10.92 %, 7.62 %, 0.13 % and 1.77 % respectively. The improvement of RF classier in terms of recall values is better than the KNN, BPNN, SRC and SVM is 20.13 %, 14.26 %, 7.93 % and 2.66 % respectively. The method's application on machines and the different bearing positions vibrations signatures are fields for the future concern. Future direction of the work is for implementation of the novel and the hybrid classifier for better results.