Bearing fault diagnosis via kernel matrix construction based support vector machine

A novel approach on kernel matrix construction for support vector machine (SVM) is proposed to detect rolling element bearing fault efficiently. First, multi-scale coefficient matrix is achieved by processing vibration sample signal with continuous wavelet transform (CWT). Next, singular value decomposition (SVD) is applied to calculate eigenvector from wavelet coefficient matrix as sample signal feature vector. Two kernel matrices i.e. training kernel and predicting kernel, are then constructed in a novel way, which can reveal intrinsic similarity among samples and make it feasible to solve nonlinear classification problems in a high dimensional feature space. To validate its diagnosis performance, kernel matrix construction based SVM (KMCSVM) classifier is compared with three SVM classifiers i.e. classification tree kernel based SVM (CTKSVM), linear kernel based SVM (L-SVM) and radial basis function based SVM (RBFSVM), to identify different locations and severities of bearing fault. The experimental results indicate that KMCSVM has better classification capability than other methods.


Introduction
Rolling element bearing (REB) is a critical unit in rotating machinery and its health condition is often monitored to identify incipient fault.When a defect like bump, dent or crack that occurs in REB' outer race, inner race, roller or cage, continuously contacts another part of bearing under operation, a sequence of impulsive responses can be acquired in the form of vibration [1][2][3], acoustic emission [4], temperature, motor current, ultrasound [5], etc.However, the measured signals involve both fault-induced component and noises from structure vibration, environment interference, etc.Furthermore, fault-induced signal is often masked by noises due to its relatively low energy.In fact, many signal processing techniques including time domain analysis, frequency analysis and time-frequency analysis have been explored to draw fault signatures effectively.For example, Statistical parameters in time domain are used as defective features such as RMS, Variance, Skewness, Kurtosis, etc. [6,7].Features are derived from time series model like the Autoregressive [8,9].Frequency analysis aims to find whether characteristic defect frequency (CDF) exists in spectrum [10][11][12][13][14].As non-stationary signals, bearing fault signals are extensively dealt with using time-frequency analysis to obtain local characteristic information both in time and frequency domain [15][16][17].Two or more kinds of signal processing techniques are also combined together for feature extraction [18][19][20].Some signal analysis methods have been optimized before performing feature extraction [21,22] like flexible analytic wavelet transform [23] by employing fractional and arbitrary scaling and translation factors to match fault component.High-dimension features could be compressed into low-dimension features by optimal algorithms [24][25][26] like manifold learning [27][28][29][30] for efficient diagnosis.Due to its complexity of bearing, it is almost impossible for even domain experts to judge the bearing condition just by inspecting the characteristic indices.In order to automate diagnosis procedures and decision-making on REB health state, a variety of automatic diagnosis methods have been put forward such as artificial neural network (ANN), support vector machine (SVM), fuzzy logic, hidden Markov model (HMM) and other novel approaches [31].In [32], the anomaly detection (AD) learning technique has got higher accuracy than SVM classifier for bearing fault diagnosis.The trifold hybrid classification (THC) approach can isolate unexampled health state from exampled health state and discriminate them exactly [33].Simplified fuzzy adaptive resonance theory map (SFAM) neural network is investigated and able to predict REB remaining life [34].A poly-coherent composite spectrum (PCCS), retaining amplitude and phase information, is observed to have a better diagnosis than methods without phase information [35].HOS-SVM model, which integrates high order spectra (HOS) features and SVM classifier, indicates the capability of diagnosing REB failures [36].
As mentioned above, great progress has been made in detecting bearing conditions.Meanwhile, these proposed methods also face some challenges.For instance, owing to the fluctuation in speed or load, a measured CDF is probably inconsistent with the theoretical calculation.The selection of base wavelet and scale levels mostly relies on researchers' experience and prejudice rather than objective criterions.The discrete wavelet transforms (DWT) still suffers from limitations of fixed scale resolution regardless of signal characteristics.The structures of ANN, particularly initial weights, which are randomly determined by trial and experience, may weaken generalization capability and training velocity.For SVM classifier, the kernel function is demanded to map samples from an input space to a higher feature space where the samples can be linearly separated.However, the kernel function confines to typical formulas such as linear, polynomial, radial, multilayer perception and sigmoid function which will not surely succeed in search of the intrinsic correlation among the samples.Consequently, it possibly contributes to poor classification.
Thereby, a novel method on kernel matrix construction for SVM (KMCSVM) is proposed to identify REB fault more precisely.Two kernel matrices, i.e. the training kernel matrix and the prediction kernel matrix , are constructed in this way.The matrix exposes the similarity of intrinsic characteristics among training samples, while the matrix specifies the similarity between training samples and test samples.The results show that KMCSVM has better ability for REB fault diagnosis.To our best knowledge, KMCSVM has not been observed in rotating machinery fault diagnosis fields.
The rest of this paper is organized as follows: Section 2 reviews the background knowledge about CWT and singular value decomposition (SVD) for feature extraction.The procedure based on KMCSVM is presented in Section 3. The proposed method is validated by identifying bearing fault locations and severities in Section 4. Finally, conclusions are drawn in Section 5.

Methods review
Because signals from defective bearing are non-stationary, nonlinear, local and transient, CWT is chosen to process the signals and SVD is used to calculate the eigenvector from the coefficient matrix as signal signature.

Continuous wavelet transform
CWT aims to measure a local similarity between wavelet ( ) at scale position and signal ( ).The wavelet coefficient ( , ) can be defined by Eq. (1): By shifting ( ) in time and scaling ( ), a wavelet coefficient matrix can be created which is viewed as a time-frequency space as Eq. ( 2) and represents the dynamic characteristics of the signal ( ): where , is the coefficient at the th scale and at the th data point of a sample signal.

Singular value decomposition
SVD is used to decompose the wavelet coefficient matrix .Assuming matrix with the size of × , the SVD results can be expressed by Eq. (3): where and are orthogonal matrices of × and × , respectively.Λ is an × diagonal nonnegative matrix.The diagonal elements in are called singular values (SVs) of , which are only determined by matrix itself and denote the natures of matrix , namely, the characteristics of a sample signal.Given , Eq. ( 3) can be illustrated in details as Eq. ( 4): SVs constitute vector described as Eq. ( 5). also denotes the feature vector extracted from a sample signal:

Proposed method
SVM is well suited for linear pattern recognition.However, the original feature vectors extracted from REB are not linearly separated.Suppose there exists a high dimensional space where the original feature vectors are mapped into the high dimension feature vectors that can be linearly separated using SVM in it, the linear pattern recognition based SVM turns to find kernel matrices with the inner product between the imaged high dimension feature vectors.Fig. 1 shows the stages of kernel pattern analysis.The sample feature vectors are used to create training and predicting kernel matrix.The pattern function then uses the matrices to recognize unseen samples.For kernel pattern analysis, the key is how to construct kernel matrices.
Assume ( ) is a image of point mapped into a high dimensional feature space and all the sample images can be separated by a hyper-plane as Eq. ( 8): The hyper-plane is determined to solve the following optimization problem: Subject to It is equivalent to solving a constrained convex quadratic programming optimization problem: And , = 〈 ( ), 〉.
is named training kernel matrix which is a × symmetric matrix with = , , the inner product between the images of two training samples in space .is the number of training samples, is a column vector with = 1, = ( , , … , ) is a Lagrange multiplier vector, is an × diagonal matrix with Λ = , is error penalty constant, is the th sample class label.
By maximizing ( ), the optimized * can be obtained.Thus, the optimized * can be computed using the following equation: where is the th column vector of .Hence, the pattern function of SVM to predict the class of unseen sample can be written as: and: is named prediction kernel matrix of × with = ( , ) , the inner product between the images of a training sample and a test sample in space .is the number of the test samples, is the th column vector of .According to Eq. ( 15), the result of pattern analysis just depends on kernel matrices, so it is feasible for SVM to solve nonlinear classification problems by developing appropriate kernel matrices.

Kernel matrix construction
A novel method on kernel matrix construction (KMC) is presented to solve nonlinear classification problems using pattern analysis based SVM.
To our best knowledge, this KMC based method has not been studied in the field of machinery fault diagnosis.The specific procedure of KMC is stated below and illustrated in Fig. 2. .Suppose there exists classes of samples in . is used to construct training kernel matrix of × .and are used for predicting matrix of × .Let be a matrix of × , of × .Initialize ( ), ( ) = 0.
Step 2: Produce distance matrix ( ) by computing pairwise distance of samples using Eq.(17).Thus, about pairwise distance of training samples and about pairwise distance between training and test samples are shown as Eq. ( 18): denotes the distance of the th training sample and the th training sample in and the distance of the th training sample and the th test sample in .
Step 3: Find the closest neighbors distribution of each sample.The closest neighbors of each sample are the least numbers in each column of ( ).Set 1 to the elements in ( ) that have the same locations of the least numbers in ( ).The rows of ( ) is divided into blocks, its blocks and columns stand for classes and samples, respectively.Eq. ( 19) shows the k closest neighbors distribution in different classes by setting 1: Step 4: Classify using majority vote among the neighbors.If a sample has the majority of neighbors within one block, the sample belongs to the block related class.Set 1 to the column within the block, 0 to the rest of that column.For example, if belongs to the 1st class, ( ) is revised as Eq. ( 20): Step 5: Compress multi-classes of ( ) into two classes.The 1st class remains unchangeable and the other classes merge into the 2nd class.Where a sample is 0(1) in the 1st class must be 1(0) in the 2nd class.The updated and are shown as Eq. ( 21): Step 6: Select the 1st row of ( ) as a row matrix ( ). reveals training samples class, describes test samples class: The training kernel matrix can be constructed based on , it is an × symmetric matrix with diagonal element 1 as Eq.(23).reflects the similarity among training samples.The prediction matrix with × can be likewise established according to and .exhibits the similarity between training and test samples.In ( ) "1" means the maximum similarity between corresponding samples and "0" means no similarity: Step 7: Increase = + 1 and repeat from Step 3 to Step 6 till k exceeds the upper.The upper should be given to a medium value to save computing time.
Step 8: Take the average of the matrices ( ).A number of ( ) would be produced with the closest neighbor changing from the lower to the upper.Average these matrices to get better intrinsic relations among samples.The averaged ( ) is applied to the pattern function for classification.

Case studies
REB fault diagnosis is investigated to validate the effectiveness of KMCSVM.Fig. 3 shows the scheme of REB fault diagnosis.

Experimental setup and vibration data
The experiment data about faulty bearings is taken from the Case Western Reserve University Bearing Data Center.The vibration data has been widely utilized as a standard dataset for REB diagnosis.As shown in Fig. 4, the test stand consists of a 2 hp motor (left), a torque transducer/encoder (center), a dynamometer (right), and control electronics.The test bearings support the motor shaft.Motor bearings were seeded with faults using electro-discharge machining.Faults ranging from 0.007 inches in diameter to 0.021 inches in diameter were introduced separately at the inner raceway, rolling element and outer raceway.Faulted bearings were reinstalled into the test motor and vibration data was recorded for motor loads of 0 to 3 horsepower (motor speeds of 1797 to 1720 RPM).Bearing Information is shown as Table 1 and Table 2. Vibration signal was collected using accelerometers, which were attached to the drive end of the motor housing with magnetic bases.Then vibration signal was digitalized through a 16 channel DAT recorder.Digital data was collected at 48.000 samples per second for drive end bearing faults and post processed in a MATLAB environment.Speed and horsepower data were collected using the torque transducer/encoder and were recorded.In this experiment, the vibration data of the drive end bearing are chosen to perform location and severity identification of bearing fault.The sampling frequency is 48 kHz and each sample contains 2048 data points.Four different bearing conditions, i.e. healthy state, outer race fault, inner race fault and ball fault are observed for fault location recognition using KMCSVM.In addition, four types of fault severities (healthy, 0.007, 0.014 inch and 0.021 inch) are also considered to assess KMCSVM classification performance.

Feature extraction
Referring to wavelet selection criterion in subsection "Wavelet selection" presented in [37], the energy to entropy ratios about six different wavelets including the Shannon, Gaussian, Complex Morlet, Daubechies, Meyer and Morlet are plotted in Fig. 5. due to the maximum energy to entropy ratio, the Shannon wavelet is selected as the best mother wavelet to perform continuous wavelet transform.The feature vectors are calculated from the coefficient matrices using SVD.

Classification of bearing conditions
The performance of KMCSVM is evaluated by identifying bearing fault location and fault severity, and compared with other kernel pattern recognition methods like CTKSVM, L-SVM and RBFSVM that have been studied in the previous work [37].CTKSVM is a SVM based on the classification tree kernel which is constructed using fuzzy pruning strategy and tree ensemble learning algorithm to improve the diagnostic capability of REB fault.L-SVM makes use of classical linear kernel as well as RBFSVM with radial basis function to diagnose REB fault.Both five-fold cross validation and independent test are conducted to obtain the classification accuracy of these SVM classifiers.To discover the true fault from the possible multi-faults, SVM classifiers are trained in a tournament of one against others by setting one class as +1 and others as -1, and continuous to detect unknown sample in the same manner.

Identification of fault location
Fault location recognition strives to distinguish four different bearing conditions, i.e. healthy state, outer race fault, inner race fault and ball fault.Table 3 lists 12 datasets with various loading, fault size and shaft speed for analysis.There are 48 samples for each state, thus total 192 samples for all states in each dataset shown as Table 4.
The groups of sample sets are allocated in the way that satisfies the tournament of training and test using five-fold cross validation and independent test as described in Table 5. Fig. 6 illustrates the accuracy of the four classifiers corresponding to the 12 datasets in Table 3 using five-fold cross validation.The classification accuracy of RBFSVM is obviously lowest among all the methods.In eight cases (Fig. 6(b)-(h), Fig. 6(k)), KMCSVM achieves a higher classification accuracy.In three cases (Fig. 6(a), Fig. 6(i), Fig. 6(l)), the classification rates based on KMCSVM, CTKSVM and L-SVM are almost similar to each other.Only in one case (Fig. 6(j)), the classification accuracy of KMCSVM is slightly lower than those of CTKSVM and L-SVM.As a whole, the classification ability increases in the order of RBFSVM, L-SVM, CTKSVM and KMCSVM.Additionally, the classification accuracy of KMCSVM maintains the least fluctuation.It indicates that KMCSVM is insensitive to the changes of sample sets.
The classification accuracy of KMCSVM is observed as the fault size changes under specific loads (0 HP, 1 HP, 2 HP, 3 HP).It can be inferred from Fig. 7 that the accuracy of KMCSVM descends in sequence of fault sizes from 0.007 to 0.021 then to 0.014 inch except that the accuracy alternately occurs between 0.014 and 0.021 inch under 1 HP load as described Fig. 7(b).In the early stage of bearing fault (0.007 inch), the accuracy arrives at 100 %.The accuracy then falls with the growth of bearing fault (0.014 inch).When the fault size further enlarges (0.021 inch), the classification accuracy rises again.It also can be seen from Table 6 that the average accuracy of KMCSVM, whenever five-fold cross validation or independent test, is the highest (all more than 95.60 %).The corresponding training and test time are summarized in Table 7.For 5 folds cross validation, the computational cost of training KMCSVM is higher than that of the other three methods.The reason is that the construction of training kernel matrix needs more computational time.Once KMCSVM is trained, it has the efficient diagnosis capability with no more than 8.3 s.For independent validation, it takes less time to train (less than 9.97 s) and test (less than 3.02 s) KMCSVM which is very close to other methods.Thereby, KMCSVM displays its outstanding fault diagnosis performance.CHENXI WU, TEFANG CHEN, RONG JIANG and its lifetime.In Table 8, four types of fault severity conditions are considered to assess KMCSVM classification performance using datasets in Table 9.
The groups of sample sets are provided by means of tournament to identify different fault sizes as described in Table 10.According to the results in the above experiments, KMCSVM earns higher accuracy in diagnosis of fault locations and severities compared to the other three methods.The success of KMCSVM owes to the strategy for the construction of kernel matrix and .This strategy can effectively suppress irrelevant features and mine the similarity degree of samples.So and can express the intra-class compactness and inter-class separation more objectively than CTKSVM.RBFSVM and L-SVM employ fixed kernels that have nothing to do with the analyzed samples, thus fall behind KMCSVM and CTKSVM.Hence, KMCSVM is a competitive method for REB fault diagnosis.

Conclusions
In this study, KMCSVM based on kernel matrix construction is proposed to carry out nonlinear classification for REB defects.The results of fault locations and severities identification verify that KMCSVM can achieve higher accuracy for bearing fault diagnosis than the other SVM classifiers.KMCSVM also has the ability to keep robust against the load interferences and detects defects at earlier time, which is significant for REB condition monitoring.In addition, the effectiveness of KMCSVM can help to predict deterioration degree and remaining lifetime of bearing.Summarily, KMCSVM demonstrates its great advantages and potential in rotating machinery fault diagnosis.

Fig. 1 .
Fig. 1.Stages in the implementation of kernel pattern analysis3.1.Kernel matrix pattern based SVMA training set and a test set are given as below:

Fig. 2 .
Fig. 2. Flow chart of training kernel matrix construction Step 1: Provide training set =

Fig. 6 .
Fig. 6.Accuracy of the four classifiers corresponding to 12 datasets Fig. 8 describes the classification accuracy of KMCSVM with the load variation while fixing the fault size.In Fig. 8(a), the accuracy for fault with 0.007 inch always keeps 100 %.So KMCSVM is robust against the load interference and excellent fault classification performance.From Fig. 8(b) and Fig. 8(c), it demonstrates that the loading disturbances bring the accuracy fluctuations irregularly.It also can be seen from Table6that the average accuracy of KMCSVM, whenever five-fold cross validation or independent test, is the highest (all more than 95.60 %).The corresponding training and test time are summarized in Table7.For 5 folds cross validation, the computational cost of training KMCSVM is higher than that of the other three methods.The reason is that the

Fig. 11 (
Fig.11(c)-(d)), the accuracy based on KMCSVM are second only to L-SVM.Fig.11(b) indicates the accuracy of KMCSVM is slightly lower than those of CTKSVM and L-SVM.Consequently, KMCSVM is highly suitable for fault severity recognition of bearing outer race and inner race.Moreover, the accuracy curves of KMCSVM stay little fluctuation.It exhibits good stability of KMCSVM on changes of sample sets and load interference.

Fig. 9 .Fig. 10 .
Fig. 9. Accuracy of the four classifiers for fault severity in bearing outer race

Fig. 11 .
Fig. 11.Accuracy of the four classifiers for fault severity in bearing ball

Table 3 .
Description of 12 datasets on fault locations

Table 4 .
Composition of dataset on fault locations

Table 5 .
Sample set with different fault locations for training and test

Table 6 .
Average accuracy of 4 classifiers using 12 datasets on fault locations

Table 7 .
Average training time and test time of 4 classifiers using 12 datasets on fault locations

Table 8 .
Composition of dataset on fault severity

Table 9 .
Description of 12 datasets on fault severity

Table 11
gives the average accuracy of 4 classifiers about REB fault severity recognition.For 2543.BEARING FAULT DIAGNOSIS VIA KERNEL MATRIX CONSTRUCTION BASED SUPPORT VECTOR MACHINE.CHENXI WU, TEFANG CHEN, RONG JIANG five-fold cross validation, the classification performance of KMCSVM is slightly lower than L-SVM because KMCSVM is not so well as L-SVM in fault severity recognition of bearing ball.However, KMCSVM is the best one of 4 classifiers which gets the highest accuracy for independent test.The corresponding training and test time are shown in Table12.The computational cost of training and test KMCSVM is similar to that used for fault locations diagnosis mentioned above.

Table 10 .
Sample set with different severities for training and test

Table 11 .
Average accuracy of 4 classifiers using 12 datasets on fault severity

Table 12 .
Average training time and test time of 4 classifiers using 12 datasets on fault severity