Application of orthogonal neighborhood preserving projections and two dimensional hidden Markov model for the degradation evaluation of rolling elements bearings
Yongxiang Zhang^{1} , Yujie Xiao^{2} , Shuai Zhang^{3} , Shengjie Wang^{4}
^{1, 3, 4}Naval University of Engineering Power Engineering Marine Engineering, Wuhan, 430033, China
^{2}Naval Institute of Equipment, Beijing, 100000, China
^{1}Corresponding author
Journal of Vibroengineering, Vol. 19, Issue 4, 2017, p. 24272438.
https://doi.org/10.21595/jve.2016.17084
Received 16 April 2016; received in revised form 16 August 2016; accepted 24 September 2016; published 30 June 2017
JVE Conferences
An effective degradation indicator created from the general features is still a hotspot for the condition monitoring of bearing. To cover the shortage of the general features based indicator, some new indicators are built using multiple general features extracted from the original vibration signal without considering the internal relevancy among the features. To address that problem, a new indicator is proposed using the Orthogonal Neighborhood Preserving Projections (ONPP) and 2Dimensional Hidden Markov Model (2D HMM). With the ability of keeping the local structure of data set, Orthogonal Neighborhood Preserving Projections is used to obtain the low dimensional features with the main information remained. Unlike 1Dimensional dataprocessing algorithm that commonly converts the multiple features into a vector to deal with the highdimensional data with the integral property of the multiple features considered only, 2Dimensional Hidden Markov Model not only take the relevance between the individuals of fault features into consideration but also capture the global characteristics of the multiple features. Then a likelihood probability based health assessment indication can be constructed by combing 2D HMM with the data preprocessed by ONPP. The experiment results indicate that the proposed indicator show great abilities to make degradation performance of the bearing and is sensitive to incipient defects.
Keywords: data mining, diagnostics, Markov modeling, reliability engineering, process monitoring, rolling elements bearings, incipient defects.
1. Introduction
The machine health assessment has become very important in many mechanical industries as the health condition has a great effect upon the industrial costs and produce safety. Meanwhile, rolling elements bearing as one of the most important components in rotating machinery also has much connection with health degradation performance. Since the bearing is commonly internally mounted and cannot be easily take down, the maintenance of the bearing is generally applied by regular replacement at fixed period which is determined by the experience. Though many diagnostics can be used for bearing fault detection, these methods can be only effective on significant fault which may brings extra damage to the mechanism connecting with the bearing. Therefore, it is very important to realtimely detect the bearing health condition and figure out a maintenance plan at the incipient fault period. Otherwise, an effective health indicator with the sensitive to the initial fault and significant trend from normal to failure is the key element of health assessment.
Initially, several kinds of statistics extracted from vibration signal which is one of the most practical measurements are used for health assessment, such as kurtosis, RMS and peaktopeak value, etc. [1, 2]. However, these general features that contain a lot of background noise cannot be effective to describe the variation tendency of bearing health condition using single feature. Moreover, the degradation indicators made of these features which are highly stochastic have low sensitive to the incipient faults. To overcome that problem, multiple features based indicator has been developed for degradation performance. [3] introduced a new health index based on selforganizingmap (SOM), which used minimum error of the space distance as the index of health condition. Then Huang used this index and back propagation neural network to predict the residual life of rolling elements bearings [4]. Liao used genetic programming to improve the degradation trend [5]. Comparing with the general fault features, these indicator presents better sensitive to incipient fault. However, these general fault features used for the construction of these indicators aren’t preprocessed to eliminate the impact of the background noise which can affect the accuracy of degradation performance. Thus, to extract the most effective information from the general features becomes a big challenge. Yu proposed several novel index based on generative topographic mapping (GTM), Gaussian mixture model (GMM) and Bayes [6, 7]. And the general features used for the construction of these indicators are preprocessed using locality preserving projections and Dynamic principal component analysis, which decrease the negative effects on the effectiveness of health indicator. The above methods that use the integral characteristics of multiple general features to build the indicator don’t consider the internal relevancy among these general features, which can have a great effect on the stochastic and sensitive of the indicator. 2Dimension Hidden Markov Model (2D HMM) that has seeped into a wide range of fields is capable to handle stochastic problem of matrix data and presents better effect compared with the above stated cases [810]. Moreover, a linear dimensionality reduction algorithm called Orthogonal Neighborhood Preserving Projections (ONPP) arises in many fields of data processing, which can reveal the natural difference of different faults through analyzing and extracting of the inherent structure hidden in the raw feature data [11].
Therefore, in order to enhance the indicator’s sensitive to covert fault and make the remarkable degradation tendency, a new approach is proposed based on ONPP and 2D HMM. The main contribution of this paper is to propose a new indicator for the bearing mounted in complicated structure whose vibration signal is also complex and cannot be extracted effective general features for health assessment. The proposed indicator can present the initial defects and show a significant uptrend with little upward and downward tendency, which is smooth and would be more fit for realtime health monitoring and residual life prediction. As the general features extracted from the original signal contain a lot of background noise, ONPP can reduce the dimensionality of the general features set to eliminate some influence of the noise with the main information remained. After the new feature set is obtained, 2D HMM with its ability to deal with the stochastic property of the data matrix is used to build the likelihood probability based indicator for degradation performance. Finally, several experimental data are used for validation, and the results illustrate the effectiveness of the proposed methods.
2. The principle of the 2D HMM based degradation indicator
In this work, a new health indication is proposed based on 2D HMM and ONPP. In order to extract sensitive information from the general features, ONPP is used to preprocess the data. Then, the processed data with the 2D HMM can be used to construct likelihood probability based indicator of bearings.
2.1. The principle of ONPP based features extraction
The ONPP is a novel data dimensionality reduction processing technique. ONPP shares some properties with Locally Linear Embedding which can implement the nonlinear dimensionality reduction by manifold learning method. This technique aims to reflect the intrinsic geometry of the local neighborhoods with the orthogonal projection.
The main idea of this algorithm is to seek an orthogonal mapping of a given data set so that a graph that describes the local geometry can be best preserved. Consider a high dimensional space represented by the data set $\{{X}_{i}{\}}_{i=1}^{N}\in {R}^{D}$, which will be mapped to the low dimensional data set $\{{Y}_{i}{\}}_{i=1}^{N}\in {R}^{d}\text{.}$ Since this algorithm seeks the intrinsic geometric properties of the local neighborhoods, the data point ${X}_{i}$ in high dimensional space should find the neighbour points ${X}_{ij}$ which can be linearly combined into it. Thus, we obtain a objective function:
where ${X}_{ij}$ is the $j$th neighbour point of ${X}_{i}$, and ${W}_{ij}$ is the weight of the neighbour point ${X}_{ij}$. $\epsilon \left(W\right)$ is the linear reconstruction error. Several conditions should be satisfied for the selection of ${W}_{ij}$. First, the ${W}_{ij}$ should be fixed to minimize $\epsilon \left(W\right)$. Second, if ${X}_{j}$ is not in the neighbourhood of ${X}_{i}$, ${W}_{ij}=0$. Third, ${\sum}_{j}{W}_{ij}=1$. The solution of $W$ is the conditional least squares problem. The Eq. (1) can be changed as:
where ${Q}^{\left(i\right)}$ is local covariance matrix and ${Q}^{\left(i\right)}\in {R}^{K\times K}$. Thus, ${Q}^{\left(i\right)}$ can be computed by:
Then, under the constraint ${\sum}_{j}{W}_{ij}=1$, the reconstruction weight matrix $W$ can be obtained as:
Since the reconstruction weight matrix $W$ reflects the invariance property of the local dimensional reduction, the reconstructed space $\{{Y}_{i}{\}}_{i=1}^{N}\in {R}^{d}$ should follow the weight matrix $W$. Therefore, we can obtain $Y$ by minimizing the reconstruction error function the same as the method for the solution of $W$. Thus, a new error function is constructed as follow:
In this case ${W}_{ij}$ can be expanded to a sparse matrix $W$ with the dimensionality $N\times N$, and ${W}_{i,N\left(j\right)}={W}_{ij}$. Thus Eq. (5) can be changed as:
where $M\in {R}^{N\times N}$, $M=(IW{)}^{T}(IW)$. Then an explicit linear mapping from $X$ to $Y$ is imposed. So, we have ${y}_{i}={V}^{T}{x}_{i}$ for a certain matrix $V\in {R}^{m\times d}$ to be determined. Then the objective Eq. (6) becomes:
Then note that:
Thus, the solution to $V\in {R}^{m\times d}$ turns into the solution to the eigenvalue problem of matrix $\stackrel{~}{M}$, and the eigenvectors of $\stackrel{~}{M}$ corresponding to its smallest eigenvalues is $V$. Then the low dimensional data can be obtained as follow:
The main steps of the algorithm are shown as follow:
1) Compute the k nearest neighbors of data points
2) Computer the weights ${\omega}_{ij}$ which give the best linear reconstruction of each data point ${x}_{i}$ by its neighbors.
3) Compute $V$ the matrix whose column vectors are the d eigenvectors of:
4) Associated with the smallest eigenvalues.
5) Compute the projected vectors ${y}_{t}={V}^{T}{x}_{t}$.
After the general feature set is preprocessed by ONPP, the new low dimensional set with the local and global geometry of the high dimensional data samples remained can be effectively used for the health degradation performance.
2.2. The likelihood probability indicator based on the 2D HMM
Since the general features set is highly dimensional and contains redundant information, the ONPP is used for processing them to obtain the new features set with low dimension and local important information. However, the new features set is still not effective for degradation performance because of its high stochastic and low sensitivity to initial fault. Therefore, we use 2D HMM to construct the health indicator based on likelihood probability.
2D HMM is evolved from 1D HMM which is used for 1D sequence data processing. Although 1D HMMs can deal with 2D time series by processing the global property of the 2D data as the similar with most other recognition algorithm, they are not good models for processing matrix data. To address that problem, 1D HMM was extended to 2D HMM which consisted of pseudo 2D HMM and fully connected 2D HMM. Here, pseudo 2D HMM that is relatively simple in comparison to other 2D HMMs is used in this paper. 2D HMM that actually is a twofold 1D HMM consists of a supper 1D HMM embedded by a simple 1D HMM. The states of the supper 1D HMM and the simple 1D HMM are respectively corresponding to the superstates and simplestates. Every superstate contains a complete 1D HMM. The model formulation of 2D HMM can be defined as follows:
The superstates are defined as $({\theta}_{1}\dots \dots {\theta}_{N})$ where $N$ is the number of the states. The symbol of state at time $t$ is expressed as ${q}_{t}$ and ${q}_{t}\in ({\theta}_{1}\dots \dots {\theta}_{N})$. The initial probability distribution of super 1D HMM is expressed as $\mathrm{\Pi}$. The transition matrix of the super 1D HMM is defined as $A=({a}_{ij}{)}_{N\times N}$, where ${a}_{ij}=P({q}_{t+1}={\theta}_{j}{q}_{t}={\theta}_{i})$, $1\le i,j\le N$.
Since every superstate corresponds to a simple 1D HMM, the parameters of every 1D HMM are different. Assume that a simple 1D HMM in superstate $h\in \left({\theta}_{1},\dots ,{\theta}_{N}\right)$ is defined as follow:
The states of the simple 1D HMM are defined as $\left({\beta}_{1},\dots ,{\beta}_{{N}^{\left(h\right)}}\right)$ where ${N}^{\left(h\right)}$ is the number of the states. The state symbol of the $m$th observation is expressed as ${s}_{m}^{\left(h\right)}$ and ${s}_{m}^{\left(h\right)}\in \left({\beta}_{1},\dots ,{\beta}_{{N}^{\left(h\right)}}\right)$.
The transition matrix of the simple 1D HMM is defined as ${A}^{\left(h\right)}={\left({a}_{ij}^{\left(h\right)}\right)}_{{N}^{\left(h\right)}\times {N}^{\left(h\right)}}$, where ${a}_{{}_{ij}}^{\left(h\right)}=P({s}_{{}_{m+1}}^{\left(h\right)}={\beta}_{j}{s}_{m}^{\left(h\right)}={\beta}_{i})$, $1\le i,j\le {N}^{\left(h\right)}$. The observation matrix is expressed as ${B}^{\left(h\right)}$ and ${B}^{\left(h\right)}=\left\{{b}_{i}^{\left(h\right)}\right\}$, $1\le i\le {N}^{\left(h\right)}$.
As with the 1D HMM, 2D HMM contains three problems which are respectively likelihood probability computation, decode and model training. In order to build the health index, the solution problem of likelihood probability is introduced here. When a 2D HMM and a sequence of observations are available, we can calculate the likelihood probability that if the sequence of observations are produced by the given model. First, we compute the probability of a simple 1D HMM in superstate ${\theta}_{i}$, which is given by:
Then, the iterative computation is used for likelihood probability, which is shown as follow:
Then, likelihood probability can be obtained by:
The solution of the Eq. (12) should be conducted by the forwardalgorithm which with the training algorithm of the 2D HMM is fully presented in [12]. Since the health data is used for the training of the model, the probability can describe the degree that the actual fetch data deviated from the health status. Thus, likelihood probability can be treated as the health index.
The 2D HMM uses the simple 1D HMM to deal with the internal data property among the multiply features and uses the super 1D HMM to deal with the integral property of the features. Thus the 2D HMM based index can not only conduct the internal and integral characteristics of the multiply general features but also reduce the stochastic effect on timedomain. Under normal circumstances, log likelihood probability (LLP) is negative, to enhance its visualization, negative LLP (NLLP) is treated as a health index in this study.
2.3. The principle of NLLP based health index
Vibration is one of the most widely used measurements for diagnostics. Thus, when the vibration signal is obtained, the general features can be computed. Since the general features extracted from original signal contain lots of noises which could cause the indicator to show huge randomness, the first step to build NLLP based degradation indicator is to use ONPP for the pretreatment of the highdimensional data composed of multiple general features extracted from original signal. In the ONPP based pretreatment, the low dimensional number selected for the construction of indicator is determined according to experience, which sometimes is 24. Then, the new lowdimensional health data is used to train 2D HMM which can represent the characteristic of the health condition. Afterwards, NLLP based index can be utilized for the assessment of the degradation performance when the online data is obtained. The data that need to be monitored should be also processed by ONPP to eliminate the negative effects of the noise. Then the negative log likelihood probability can be computed using the 2D HMM model trained by the health data, which indicates the degree that the monitoring data deviated from the health condition. If the value of negative log likelihood probability is bigger, it indicates the degree that the data deviated from the health condition is bigger. Thus, it can tell that something abnormal may happen. The construction procedure of NLLP based index is shown in Fig. 1.
Fig. 1. System framework for bearing performance degradation assessment
3. Experimental verification
3.1. Experimental data
In order to evaluate the effectiveness of the proposed method for degradation performance of rolling elements bearings, several experiments are conducted in this study. The experimental device showed in Fig. 2 consist several main components which are the supporting structure for bearing, transmission shaft, Servodriven Motor, lubrication system, axial loading device and control system. The experimental bearing type is NSK7010C.
Fig. 2. Experimental setup: a) is the integral figure of the experimental device; b) is the partial enlarged detail map of supporting structure for bearing
a)
b)
In order to accelerate the fatigue, the experimental bearing whose rated axial load is 13.9 kN is working under the high strength operating condition with the axial load 20 kN. The measuring instrument is B&K3560C. Two sensors are used for data collection. One sensor is located on the outside flange while another one is located on the chassis. Four bearings are used in this experiment and every bearing is under the same work condition. The experiment begins as the engine is started. In the experimental process, the computer connecting to the measuring instruments is used for monitoring the vibration amplitude. Once the amplitude becomes large the experiment ends. Then the next bearing can be mounted in the test rig for experiment. The experimental rotating speed is 4000 rm/min, and the sampling frequency is 65536 Hz. Every sampling length of the data segment is 20 s, and the interval of the data segment is 10 min. Four groups of data are used for degradation performance evaluation in this paper.
Fig. 3. The kurtosis map of the full life
Fig. 4. The RMS map of the full life
Fig. 5. The skew map of the full life
Fig. 6. The peak map of the full life
The selection of the general fault features is in accordance with experience. In this paper, four kinds of general fault features are used for condition monitoring, which are kurtosis, root mean square (rms) value, skew, and peak value. The whole life cycle maps of the four groups of data are shown in Figs. 3, 4, 5, and 6. The corresponding equations are shown as follow:
$skew=\frac{{k}_{3}}{{k}_{2}^{3}},peak=\mathrm{m}\mathrm{a}\mathrm{x}\left(\left{X}_{i}\right\right)\mathrm{m}\mathrm{e}\mathrm{a}\mathrm{n}\left(X\right).$
$X$ is the data segment while $N$ is the corresponding length. $s$ is the variance. ${k}_{3}$ and ${k}_{2}^{}$ respectively represent third and two order central moment.
3.2. The data processing based on ONPP
In this part, the general features will be preprocessed by ONPP, and the eigenvectors with relevant eigenvalues will be obtained. Then we choose several eigenvectors corresponding to the smallest eigenvalues to compute the new low dimensional data as the output, and the number of the eigenvectors selected equals to the dimensional number of the new data. Since two and threedimensional data distribution of the bearing defect classes can be visualized well in this experiment, we select two and threedimensional data to construct effect figures which are shown in Fig. 7 and Fig. 10. Fig. 7 is the twodimensional figure processed by ONPP while Fig. 10 is the threedimensional figure processed by ONPP. Several general features are used to construct the twodimensional and threedimensional figures for comparison, which is shown in Figs. 8, 9, 11, and 12. As can be seen in Fig. 7 and Fig. 10, the fault classification processed by ONPP is clear whether the first two or three principle elements are chosen. However, the general features based fault classification totally can’t be distinguished, which are shown in Figs. 8, 9, 11, and 12. Therefore, the data processed by ONPP is more effective for fault classification in comparison to the general features.
Fig. 7. The distribution map of data processed by ONPP with two features
Fig. 8. The distribution map of kurtosis and rms
Fig. 9. The distribution map of kurtosis and skew
Fig. 10. The 3dimensional distribution map of data processed by ONPP with three features
Fig. 11. The 3dimensional distribution map of skew, kurtosis and rms
Fig. 12. The 3dimensional distribution map of skew, peak and rms
3.3. The analysis of the 2D HMM based NLLP indicator
A quantification degradation indication can increase the effectiveness of residual life prediction of key machine components. In this part, the new low dimensional data is used for the construction of degradation indicator. The health data is needed for training due to degradation trend of these general fault features. In the figures of these general fault features, the variation tendency of first 20 data points change little and thus the first 20 sample of the processed data can be used for training. The states number of 2D HMM have little impact on the model, and little applicable method can be utilized for the option of the states number. In application areas of 2D HMM, the states number is generally below 10 according to the experience. Therefore, 5 simplestates and superstates are chosen to build the 2D HMM. Since the states of the 2D HMM do not express practical sense, the initial state distribution and the state transmission matrix can be randomly selected. In order to better train the model, the parameters within the Gaussian mixture function of the 2D HMM, which are mean, variance, and weights, can be initialized by some clustering algorithms.
Fig. 13. The NLLP indicator map of the first group of data: b) is partial enlarged detail map of a)
a)
b)
With the health data based 2D HMM trained, the proposed indicator can be made, which is shown in Figs. 1316. As is shown in these figures, little change can be seen in the first 35 samples which can be regard as the health condition. Then 35 samples later, the degradation tendency present a significant increase compared with the whole life cycle maps of the general features. In Fig. 13, the slight degradation increase can be observed after the 38th data point. From the Fig. 13 to Fig. 16, the same results can also be noticed. But in Figs. 36, which show the trend of the general features based indicator, little distinct degradation with a sudden change is shown. Although the kurtosis and RMS can show the implicit fault moment at about the 50th data point in comparison to other general features, they are still far short of the proposed indicator. Otherwise, the tendency of the proposed index is slippery and presents less noise. Briefly, the proposed index can present better degradation trend and more apparent weakdefect compared with the general features. From these figures, the weakdefect threshold can also be chosen in terms of the red line at around 43. Therefore, the proposed indicator is more effective than the indicator based the general features. An extra experiment is added to validate the occurrence of the initial fault which truly happened as the revealing of proposed indicator. As is shown in Fig. 17, we take out the bearing from the test rig at the 42th sample in Fig. 17(a), and slight spot corrosion happening in outer race can be seen in Fig. 17(b).
Fig. 14. The NLLP indicator map of the second group of data: b) is partial enlarged detail map of a)
a)
b)
Fig. 15. The NLLP indicator map of the third group of data: b) is partial enlarged detail map of a)
a)
b)
Fig. 16. the NLLP indicator map of the fourth group of data: b) is partial enlarged detail map of a)
a)
b)
Fig. 17. Initial fault map
a)
b)
4. Conclusions
To improve the indication’s sensitive to initial fault and make a good trend of the degradation performance for residual life prediction, a new indicator is proposed based on ONPP and 2D HMM. Since ONPP can eliminate the background noise information and lower the dimensions of the data set composed by general features, the new effective features extracted by ONPP can give significant classification of fault type in comparison with the classification based on general features. With the new effective features, the NLLP based degradation indicator is built using 2D HMM, which can take internal characteristic between the multiple features into consideration without missing the integral information. The experimental results show that the proposed indicator can discern the weakdefect earlier and is clear to show the degradation trend of bearing performance in its whole life in comparison to the original fault features.
Acknowledgements
The work described in this paper was supported by a Grant from the National Defense Researching Fund (No. 9140A27020413JB11076).
References
 Malhi A., Yan R., Gao R. X. Prognosis of defect propagation based on recurrent neural networks. IEEE Transactions on Instrumentation and Measurement, Vol. 60, Issue 3, 2011, p. 703711. [Publisher]
 Tian Z., Zuo M. J. Health condition prediction of gears using a recurrent neural network approach. IEEE Transactions on Reliability, Vol. 59, Issue 4, 2010, p. 700705. [Publisher]
 Qiu H., Lee J., Lin J., et al. Robust performance degradation assessment methods for enhanced rolling element bearing prognostics. Advanced Engineering Informatics, Vol. 17, Issue 3, 2003, p. 127140. [Publisher]
 Huang R., Xi L., Li X., et al. Residual life predictions for ball bearings based on selforganizing map and back propagation neural network methods. Mechanical Systems and Signal Processing, Vol. 21, Issue 1, 2007, p. 193207. [Publisher]
 Liao L. Discovering prognostic features using genetic programming in remaining useful life prediction. IEEE Transactions on Industrial Electronics, Vol. 61, Issue 5, 2014, p. 24642472. [Publisher]
 Yu J. Bearing performance degradation assessment using locality preserving projections and Gaussian mixture models. Mechanical Systems and Signal Processing, Vol. 25, Issue 7, 2011, p. 25732588. [Publisher]
 Yu J. A nonlinear probabilistic method and contribution analysis for machine condition monitoring. Mechanical Systems and Signal Processing, Vol. 37, Issue 1, 2013, p. 293314. [Publisher]
 Othman H., Aboulnasr T. A separable low complexity 2D HMM with application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, Issue 10, 2003, p. 12291238. [Publisher]
 Bevilacqua V., Cariello L., Carro G., et al. A face recognition system based on pseudo 2D HMM applied to neural network coefficients. Soft Computing, Vol. 12, Issue 7, 2008, p. 615621. [Publisher]
 Bicego M., Castellani U., Murino V. Using hidden Markov models and wavelets for face recognition. Proceedings of 12th International Conference on Image Analysis and Processing, 2003, p. 5256. [Publisher]
 Effrosyni K., Yousef S. Orthogonal neighborhood preserving projections: a projectionbased dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue 12, 2007, p. 21432156. [Search CrossRef]
 Yujian L. An analytic solution for estimating twodimensional hidden Markov models. Applied Mathematics and Computation, Vol. 185, Issue 2, 2007, p. 810822. [Publisher]
Cited By
IEEE Access
YingKui Gu, Bin Xu, Hao Huang, Guangqi Qiu

2020
