Bearing remaining life prediction using Gaussian process regression with composite kernel functions

Sheng Hong1 , Zheng Zhou2 , Chen Lu3 , Baoqing Wang4 , Tingdi Zhao5

1, 3, 4, 5Science and Technology on Reliability and Environmental Engineering Laboratory, School of Reliability and System Engineering, Beihang University, Beihang, China

2Systems Engineering Research Institute, China State Shipbuilding Corporation (CSSC), Beijing, China

1, 3Corresponding authors

Journal of Vibroengineering, Vol. 17, Issue 2, 2015, p. 695-704.
Received 20 March 2014; received in revised form 30 January 2015; accepted 20 February 2015; published 31 March 2015

Copyright © 2015 JVE International Ltd.

Table of Contents Abstract Full-text  Download PDF References
Views 6
Reads 2
Downloads 27
since May, 2018

There is an urgent demand for life prediction of bearing in industry. Effective bearing degradation assessment technique is beneficial to condition based maintenance (CBM). In this paper, Gaussian Process Regression (GPR) is used for remaining bearing life prediction. Three main steps of prediction schedule are presented in details. RMS, Kurtosis and Crest factor are used for feature fusion by self-organizing map (SOM). Minimum Quantization Error (MQE) value derived from SOM is applied to represent the condition of bearing. GPR models with both single and composite covariance functions are presented. After training, new MQE value can be predicted by the GPR model according to previous data points. Experimental results show that composite kernels improve the accuracy and reduce the variance of prediction results. Compared with particle filter (PF), GPR model can predict the remaining life of bearings more accurately.

Keywords: Gaussian process regression, uncertainty distribution, bearing life prediction.

1. Introduction

Bearings are the most widely used mechanical and rotational parts in industrial equipments [1]. In order to avoid the fatal breakdowns of machines caused by unexpected failures, defects of bearings should be detected as early as possible [2]. Condition based maintenance (CBM) which can reduce the cost of maintenance, becomes an efficient strategy for modern industry these years [3]. Effective prediction of bearing remaining life is important to identify the future conditions for maintenance plans, thus can reduce the downtime, maintenance cost and safety hazards [4].

Bearing performance prognostics means forecasting the remaining useful life (RUL) or future condition of equipment based on the acquired condition monitoring data. Although studies on prognostics of bearing RUL have received much attention recent years, the accurate estimation of the RUL of bearing is still challenging. Data-driven methods predict the results according to the historical records and the condition monitoring (CM) data. Vibration signal is effective to reflect the running condition of bearings [5]. Three major steps are crucial to achieve failure prediction result. Firstly, degradation indicator should be selected properly. Secondly, the degradation of machine should be tracked and assessed continuously. Finally, RUL of machine should be effectively predicted according to the monitoring data.

Prognostics algorithm is crucial to the effectiveness of failure prediction. Many popular prediction models have already been applied in the literatures. Artificial Neural Network (ANN) which has good self-learning and adaptive ability, is widely used in these prognostics models. In [6], Feed Forward Neural Network (FFNN) was used to achieve more accurate estimation of the RUL of bearings. Reference [7] proposed a recurrent wavelet neural network (RWNN) to predict the crack propagation of rolling bearings. Meanwhile, some new non-linear prognostics methods have also been employed to forecast RUL. For example, in [8], SVM-based model was developed to compute the ratio of bearing running time to failure time. In [9], a combination of logistic regression (LR) and relevance vector machine (RVM) was used to assess the health state and failure time.

These models work perfectly for some particular scenes. However, fewer machine learning methods can provide the probability density function (PDF) of RUL, which is important to handle the uncertainty problem. The probability indication of the predicted results is important to make maintenance decisions and risk analysis [10]. Sankalita Saha [11] proposed a distributed GPR-based prognostics algorithm to predict battery RUL with an uncertainty management. The results showed that GPR is superior to particle filters (PF) for battery prognostics.

GPR is a new promising technique which has received much attention in the machine-learning field over the past years. Due to the advantages in uncertainty predictions, GPR provides a good adaptability to handle nonlinear, complex classification and regression problems. Compared with artificial neural network, GPR is simpler to understand and implement in practice. Kernel functions in GPR models are similar with support vector regression (SVR) and relevance vector machines (RVM). However, GPR provides probabilistic outputs, by an uncertainty interval for the input data. Thus, GPR is being applied to various engineering problems with uncertainty.

Aiming to achieve an accurate failure prediction of bearing, a data-driven model based on GPR is proposed in this paper. A failure prediction model is constructed based on GPR with different covariance functions after feature extraction and fusion. The training input of this model is from bearings’ run-to-failure experiment.

The rest of the paper is organized as follows. In Section 2, GPR method is introduced detailedly. The bearing failure prediction model based on GPR is described in Section 3. In Section 4, applications of the proposed method are presented with experiment data. Finally, the conclusions are made in Section 5.

2. Principle of GPR

GPR is one of the most important Bayesian machine learning approaches [12]. As a probabilistic technique for nonlinear regression, it computes the posterior degradation estimates by constraining the priori distribution to fit the available training data.

Gaussian Process (GP) is a random process. It is defined as a series of random variables with any finite number of which has a joint Gaussian distribution. A GP is completely specified by its mean function μ(x) and covariance function C(x, x'), which can be defined as:

f x ~ G P μ x , C x , x ' ,
μ x = E f x ,
C x , x ' = E f x - μ x f x ' - μ x ' ,

where x, x'X are random variables.

Given priori information about the GP and a set of training points D=xi,fxii=1, 2,,n, xiRd (n is dataset number), the posterior distribution is derived by imposing a restriction on joint priori distribution [13]. A GP model with noise is defined as:

y i = f ( x i ) + ε ,

where ε is additive Gaussian noise N(0,σ2).

For a new test input x*, we can establish a joint Gaussian priori distribution of the training output y and the test output y* as follows:

y y * ~ N 0 , C X , X + σ n 2 I C X , x * C X , x * C x * , x * ,

where CX,X is n-order symmetric positive definite covariance matrix, CX,x* is the n×1 covariance matrix of the test input x* and the training input X, and Cx*,x* is the covariance matrix of the test input x*. Under the conditions of a given input x* and the training set D, the GP can calculate the test output y* according to the posterior probability formula as follows:

y * | x * , D ~ N μ y * , σ y * 2 ,
μ y * = C x * , X C X , X + σ n 2 I - 1 y = i = 1 n α i C x i , x * ,

where μy*, σy* are expectation and variance of y* and α=C+σn2I-1y, where I is unit matrix.

The predicted mean value is a linear combination of covariance functions according to Eq. (7). A crucial ingredient in a Gaussian process predictor is the covariance function that encodes the assumptions about the functions to be learnt by defining the relationship between data points. Once a posterior distribution is obtained, it can be used to predict values for the test data points.

Three single covariance functions are used in this study, namely squared exponential (SE) covariance function, rational quadratic (RQ) covariance function and Matern class of covariance functions [12, 13]. These covariance functions are described as follows.

1) The squared exponential (SE) covariance function:

C S E x i , x j = σ f 2 e x p - ( x i - x j ) 2 2 l 2 .

2) The rational quadratic (RQ) covariance function:

C R Q x i , x j = σ f 2 1 + ( x i - x j ) 2 2 α l 2 - α   .

3) The Matern class of covariance functions:

C M x i , x j = σ f 2 1 + 3 M ( x i - x j ) e 3 M ( x i - x j ) ,

where α, l, σf2, M are hyper-parameters, and the i and j represent the i-th and the j-th vector in the input matrix X.

In order to predict the remaining life of bearing, GPR requires a priori knowledge about the form of covariance function. The choice of covariance function must be specified by users, and the corresponding hyper-parameters can be learnt from the training data. The parameters are optimized by maximizing the marginal likelihood algorithm [12], as shown in the following:

L = lg p y X , θ = - 1 2 y T Q - 1 y - 1 2 lg Q - n 2 lg 2 π ,
θ i lg p y X , θ = 1 2 t r ( α α T - Q - 1 ) Q θ i ,

where θ=(α, l, σf2, M) is a vector containing all hyper-parameters, and Q=C+σn2I is the covariance matrix for the noisy target y.

3. Remaining life prediction for bearings using GPR

In this section, the remaining life prediction model is presented. The schematic diagram of the proposed model is shown in Fig. 1.

3.1. Feature extraction

Most mechanical failures can be represented by the feature of vibration signal. After feature extraction, several time-domain features can be obtained to assess the health state of the bearing and predict the RUL [14]. Three features are used in the proposed model, i.e. RMS, Kurtosis and Crest factor [15]. These features can be calculated as follows:

K u r t o s i s = k = 1 K x k - x m 4 K x s t d 4 ,
R M S = X R M S = k = 1 K ( x ( k ) ) 2 K ,
C r e s t = X P E A K X R M S ,

where x(k) is a series of vibration signal for k= 1, 2,…, K, and K is the number of data points, and XPEAK is the maximum value of signal series.These features are usually stable under the normal condition, whereas they obviously change once bearings degrade. The three features are sensitive to different types of failures and the combination of them are more effective in performance assessment.

Fig. 1. Schematic diagram of the proposed prediction method

3.2. Feature fusion by SOM

SOM is a kind of neural network [16]. It can form two-dimensional indicator from multi-dimensional data. Each neuron of the SOM is represented by an n-dimensional weight vector mi=(mi1,mi2,,min)T . The neurons of the map are connected to adjacent neurons by a neighborhood relation. RMS, kurtosis and crest are combined as the input X. Then the SOM network is trained iteratively by the input under normal condition. For a new vector input T, the neuron whose weight vector is the closest to the input vector is called the best matching unit (BMU), which is defined as follows:

x - m B M U = m i n i x - m i .

During the training procedure, the weight vectors of BMU are updated as well as its topological neighbors by the following learning rule:

m i t + 1 = m i t + a t h n B M U , n i , t X - m i t ,

where (nBMU,ni,t) is the neighborhood function, α(t) is the learning rate. The distance between the BMU and the test input data T is called the minimum quantization error (MQE) [17]. Thus, the degradation condition can be represented by the MQE as follows:

M Q E = T - m B M U ,

where T is the test input data vector and mBMU stands for the weight vector of BMU.

3.3. Failure threshold

The bearing health state during the whole bearing life is divided into three stages as normal, degradation and failure. After sampling from bearing vibration data, we calculate the three features into MQE value and then compare the incipient failure thresholds. The normal value of MQE is in range of 0-0.3. Therefore, the incipient threshold is set as 0.3 and the failure threshold is set as 1.0. If the features exceed the incipient threshold and enter the degradation area, an incipient fault will occur. And then the prediction model will be activated to continually track and record the vibration data. Once the failure threshold is exceeded, bearing suffers from a severe degradation and must be replaced.

A multistep prediction method is used in this paper. The initial time step is a very important element in bearing remaining life prediction. In order to realize the bearing remaining life prediction, the initial time step should be a point whose MQE is below the failure threshold. Generally, we choose the initial time step with a MQE value between the incipient failure threshold (the value is 0.3 in thise paper) and the final failure threshold (the value is 1 in thise paper). However, different initial time step result in different prediction accuracy.

3.4. GPR prediction

GPR is used to predict the failure time of bearings by assessing the features of bearings. The hyper-parameters are obtained by maximizing the marginal likelihood algorithm [13]. The model outputs the predicted MQE value and confidence interval for the next steps. The latest k (in this paper, k= 10) MQEs are took as input vector of GPR to predict the next several periods. The RUL of the bearing can be obtained based on the predicted MQE and failure threshold.

Covariance function is a key factor in GPR. Three standard covariance functions are used in this paper (i.e. squared exponential (SE) covariance function, rational quadratic (RQ) covariance function and Matern class of covariance functions). If f1x, f2x,…, fnx are independent random GP, fx=i=1nfix is also a GP. Therefore, a composite covariance function can be described as:

C C K x i , x j = i = 1 n C i x i , x j .

Fig. 2. Bearing test rig sensor placement illustration



4. Case studies

In this section, we used the test data of bearing to validate the effectiveness of the proposed bearing failure prediction model. The bearing test dataset was generated from the run-to-failure test performed under a constant load condition by the national science foundation industry and university corporative research program (NSF I/UCR) center on Intelligent Maintenance Systems (IMS). The test rig is shown in Fig. 2.

Four bearings were installed on one shaft for the run-to-failure test [18, 19]. The rotation speed was maintained at 2000 rpm. A radial load of 2730 kg was placed onto the shaft and the bearings by a spring mechanism. Vibration data were collected every 20 minutes, with a sampling rate of 20 kHz. Three tests were carried out and all tests were terminated when some failure occurred.

In order to give a brief demonstration of the proposed method, a part of MQEs were shown in Fig. 3. Three time domain features are applied to construct the input vector of SOM. After training SOM, the MQE of each test dataset can be calculated by Eq. (13)-(15).

Fig. 3. MQE of test bearing

4.1. Prediction based on GPR

Several degradation datasets are extracted from Bearing 3 of Test 1 and used to make predictions. Performance degradation causes fluctuation of bearing vibration and the increase of MQE. When MQE exceeds the incipient threshold, the prediction model are activated. If MQE exceeds 1.0, bearing enters the failure stage and should be replaced.

The evolution of MQE of the vibration signal of bearing 3 with threshold setting and prediction is shown in Fig. 4. The prediction is trigged at the 40th time step, where the MQE is about 0.7. The dotted line is the actual MQE and the solid line indicates the prediction result of GPR with composite covariance function. The actual failure time is 55.3 and the predicted one is 56.6. The prediction is overestimated with an accuracy of 97.7 %, which is acceptable.

Fig. 4. Remaining life prediction for bearing MQE

The prediction model based on GPR (CK) with 95 % confidence interval is used, as depicted in Fig. 5. Three prediction processes were triggered at the 30th, 40th and 52th time step, respectively. The corresponding MQEs of the bearing are approximately equal to 0.5, 0.7 and 0.9, respectively. The lines represent the predicted MQEs and the colored areas are the uncertainty areas of the predicted results. Among these lines, the black dot one indicates the actual MQE. It can be found that the predictions at 30 failed to follow the actual trend. Thus, prediction at the early stage faces a larger deviation. While, with more new data being added, the predictions become better as illustrated by the predictions at 40 and 52.

Fig. 5. Prediction of MQE based on GPR (CK) with 95 % confidence interval at 30, 40 and 52

4.2. Comparisons between different kernels

Covariance function is crucial for GPR, and can determine the efficiency of prediction. In this case, 60 points were used to make a comparison between different kernels. The length of the training data are 10. Six covariance functions are applied to the prediction model for training and validation. As shown in Fig. 6, most of kernels can suit the actual data well for one-step prediction. The RMS error (RMSE) and output variance are used to estimate the accuracy. Although, composite covariance functions consumed more time, they exhibited better performance (smaller RMSE and variance) than the single kernels. As listed in Table 1, the RMSE of the composite kernel of SE and Matern is smaller than others. Variance also reduces significantly. The composite kernel spent additional 60 % computing time in exchange for 30 % performance enhancement, which is valuable.

Fig. 6. Predictions with different covariance functions

For an n-steps prediction, the dynamic of PDF predictions with single and composite covariance functions is shown in Fig. 7. The RUL distributions of both kernels cover the real RULs well. However, the distributions of RUL obtained from the single kernel have a larger PDF at early time, which means the variance of prediction is higher than the composite covariance functions. More accurate prediction with higher confidence can be obtained at the end life. This probabilistic implement help us make a correct judgment and provide an uncertainty management.

Fig. 7. PDF of prediction with different covariance functions

Table 1. Comparison of different kernels

Computing time
SE+ Matern

4.3. Comparison with particle filter

PF is a kind of Sequential Monte Carlo (SMC) methods that are coupled with the recursive Bayesian estimation [20]. The principle of PF is to utilize a set of weighted particles to represent the probability densities. 120 points degradation data are used to make a comparison. The failure threshold of the MQE is set as 1.0. The predictions are started at 100th time step. Both PF and GPR prediction model are run ten times and the average results are calculated. Fig. 8(a) is the prediction result using PF, and Fig. 8(b) is the prediction using GPR with the composite kernel.

Fig. 8. Comparison of MQE prediction between composite kernel GPR and PF method



The actual failure time as shown in the Fig. 8 is 116.2. While the mean value of the predicted failure time by GPR and PF are 117.8 and 119.1 respectively. Both results are acceptable. Compared with PF, GPR has a better accuracy and the predicted curve of GPR is more stable. That is to say GPR is superior to PF in maintenance decision.

5. Conclusions

This paper proposes a bearing remaining life prediction model based on GPR with composite kernel. Dynamic of PDF is illustrated in the experiment. The PDF of model outputs is essential for risk analysis and maintenance decision. From this paper, following conclusions can be drawn:

1) This paper proposes a creative application of GPR in bearing prediction field. The GPR model can simplify the prediction process with a satisfactory performance.

2) GPR with composite kernels can improve the prediction accuracy and reduce the variance of the RUL in comparison with single kernel.

3) Compared with PF algorithm, GPR model is better in terms of prediction accuracy.


  1. Heng A., Zhang S., Tan A. C. C. Rotating machinery prognostics: State of the art, challenges and opportunities. Mechanical Systems and Signal Processing, Vol. 23, Issue 3, 2009, p. 724-739.
  2. Qiu H., Lee J., Lin J. Robust performance degradation assessment methods for enhanced rolling element bearing prognostics. Advanced Engineering Informatics, Vol. 17, Issue 3, 2003, p. 127-140.
  3. Pan Y., Chen J., Guo L. Robust bearing performance degradation assessment method based on improved wavelet packet-support vector data description. Mechanical Systems and Signal Processing Vol. 23, Issue 3, 2009, p. 669-681.
  4. Huang R., Xi L., Li X., Richard C., Qiu H., Lee J. Residual life predictions for ball bearings based on self-organizing map and back propagation neural network methods. Mechanical Systems and Signal Processing, Vol. 21, Issue 1, 2007, p. 193-207.
  5. Gebraeel N., Lawley M., Liu R. Residual life predictions from vibration-based degradation signals: a neural network approach. IEEE Transactions on Industrial Electronics, Vol. 51, Issue 3, 2004, p. 694-700.
  6. Mahamad A. K., Saon S., Hiyama T. Predicting remaining useful life of rotating machinery based artificial neural network. Applied Mathematics and Computation, Vol. 60, Issue 4, 2010, p. 1078-1087.
  7. Wang P., Vachtsevanos G. Fault prognostics using dynamic wavelet neural networks. Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing, Vol. 15, Issue 4, 2001, p. 349-365.
  8. Sun C., Zhang Z., He Z. Research on bearing life prediction based on support vector machine and its application. Journal of Physics: Conference Series, Vol. 305, Issue 1, 2011, p. 12-28.
  9. Caesarendra W., Widodo A., Yang B. Application of relevance vector machine and logistic regression for machine degradation assessment. Mechanical Systems and Signal Processing, Vol. 24, Issue 4, 2010, p. 1161-1171.
  10. Saha S., Saha B., Goebel K. A distributed prognostic health management architecture. Proceedings of the Conference of the Society for Machinery Failure, 2009.
  11. Saha S., Saha B., Saxena A., Goebel K. Distributed prognostic health management with gaussian process regression. IEEE Aerospace Conference, 2010, p. 1-8.
  12. Liu K., Liu B., Xu C. Intelligent analysis model of slope nonlinear displacement time series based on genetic-Gaussian process regression algorithm of combined kernel function. Chinese Journal of Rock Mechanics and Engineering, Vol. 10, 2009, p. 2128-2134.
  13. Rasmussen C. E., Williams C. K. I. Gaussian Processes for Machine Learning. MIT Press Cambridge, MA, 2006.
  14. Mortada M., Yacout S., Lakis A. Diagnosis of rotor bearings using logical analysis of data. Journal of Quality in Maintenance Engineering, Vol. 17, Issue 4, 2011, p. 371-397.
  15. Hong S., Zhou Z., Zio E., Hong K. Condition assessment for the performance degradation of bearing based on a combinatorial feature extraction method. Digital Signal Processing, Vol. 27, 2014, p, 159-166.
  16. Kohonen T. Self-Organizing Maps. Springer, 1995.
  17. Jamsa-Jounela S., Vermasvuori M., Enden P., Haavisto S. A process monitoring system based on the Kohonen self-organizing maps. Control Engineering Practice, Vol. 11, Issue 1, 2003, p. 83-92.
  18. Lee J., Qiu H., Yu G., Lin J. Bearing Data Set. IMS, University of Cincinnati. NASA Ames Prognostics Data Repository, NASA Ames, Moffett Field, CA,
  19. Hong S., Wang B., Li G., Hong Q. Performance degradation assessment for bearing based on ensemble empirical mode decomposition and gaussian mixture model. Journal of Vibration and Acoustics, Vol. 136, Issue 6, 2014, p. 061006.
  20. Doucet A., Godsill S., Andrieu C. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, Vol. 10, Issue 3, 2000, p. 197-208.