Fault diagnosis of rolling bearing with incomplete labels using weakly labeled support vector machine
Zhou Bo^{1} , Lu Chen^{2} , Wang Zhenya^{3}
^{1, 2, 3}School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
^{1, 2, 3}Science and Technology Laboratory on Reliability and Environmental Engineering, Beijing 100191, China
^{1}Corresponding author
Vibroengineering PROCEDIA, Vol. 5, 2015, p. 187192.
Accepted 21 August 2015; published 18 September 2015
JVE Conferences
The fault diagnosis of rolling bearing has attracted increasing attention in recent years on account of the significant impact on the functionality and efficiency of complex primary system. In consideration of the bearing samples with incomplete labels, this paper investigates the possibilities of a novel fault diagnosis method using the experience of image cognition theory in dealing with the fault state classification of rolling bearings, aiming to realize fault classification that only utilizes a small amount of labeled bearing data. In this paper empirical mode decomposition (EMD) is firstly applied to the original signal, where the basic time domain features are extracted from the first three intrinsic mode functions (IMFs), and are set as the inputs of the following classifier for final training and testing. Weakly labeled support vector machine (WELLSVM), which seems more efficient than inductive support vector machines especially in the case of very small training sets and large test sets, is then established via a novel label generation strategy in the method of semisupervised learning. Validation data are collected to facilitate the comparison and evaluation of the fault diagnosis results, of which the labeled data proportion is diverse from each other. The results indicates the effectiveness of the proposed method for bearing fault diagnosis with weakly labeled data.
Keywords: rolling bearing, fault diagnosis, incomplete labels, weakly labeled support vector machine.
1. Introduction
Rolling element bearing plays an important role in the rotating machinery system, of which failure may result in serious economic losses and security incidents [1]. The importance of early detection of defects in bearings has led to continuous efforts due to the fact that unpredictable occurrence of damage may cause disastrous failure. In order to ensure the normal operation of industry, fault diagnosis of bearings is essential. Fault diagnosis of rolling element bearings using vibration signature analysis is the most commonly used to prevent breakdowns in machinery [2]. The vibration data labels are the key of fault classification. D. H. Pandya and S. H. Upadhyay investigated the APFKNN approach which was based on asymmetric proximity function with optimize feature selection, and it showed that better classification accuracy can increase reliability for the faults diagnosis of rolling bearing [3]. Diego FernándezFrancos proposed an automatic bearing fault diagnosis method based on oneclass vSVM which can identify the location of the defect and qualitatively assess its evolution over time [4].
The above two methods used all labeled data, however, in real working condition, the labels may not exist enough. Obviously, if only use a small amount of marked labels to train the prototype, on the one hand, it is often difficult to make the trained learning system have strong generalization ability; on the other hand, using only a small amount of “expensive” marked samples without using a large number of “cheap” no tag sample is also a great waste of data resources [5]. Therefore, exploiting weakly labeled training data may help improve performance and discover the underlying structure of the data. Indeed, this has been regarded as one of the most challenging tasks in machine learning research [6].
In this paper a fault diagnosis method based on the weakly labeled support vector machine (WELLSVM) is proposed. Unlike supervised learning, this method conducts fault diagnosis making full use of a large amount of the data without labels. In addition, MultiInstance learning and clustering are a potential application for WELLSVM as well. The goal of semisupervised learning of WELLSVM is to employ the large collection of unlabeled data jointly with a few labeled examples for improving generalization performance [7].
This paper is organized as follows: Section 2 briefly introduces EMD and WELLSVM; Section 3 shows the case study performed to validate the method; and Section 4 gets conclusions and relates to future works.
2. Methodology
2.1. Empirical mode decomposition (EMD)
The empirical mode decomposition (EMD) method is able to decompose any complicated signal into finite components called intrinsic mode functions (IMFs) [8]. In the EMD decomposition, a signal must satisfy two criteria to be an IMF: (1) in the whole data set, the difference between the number of maxima and the number of zero crossings must be no more than one; and (2) the average of the upper and lower envelopes is zero at any time instant. The standard EMD process of a signal can be described as follows:
(1) In order to obtain the upper or lower envelope of the signal $x\left(t\right)$, a cubic spline is employed to link all the local maxima or the minima points of the signal. The local maxima (or minima) is obtained by comparing the values of neighboring points, if a point’s value is larger (or lower) than both its neighbors, it will be taken as a local peak.
(2) The different over time ${x}_{1}\left(t\right)$ is obtained from the data which subtracts the averaged trace $m\left(t\right)$ of the upper and lower envelopes:
(3) Let $x\left(t\right)={x}_{1}\left(t\right)$ and repeat step (1) and (2) until ${x}_{1}\left(t\right)$ meets the two criteria of an intrinsic mode. The resulting ${x}_{1}\left(t\right)$ of this process is an IMF, represented as ${C}_{j}\left(t\right)$ in the below, where $j$ is the label of scale.
(4) The residue signal ${r}_{j}\left(t\right)$ is obtained by separating ${C}_{j}\left(t\right)$ from the initial signal $x\left(t\right)$:
With this decomposition process, the original signal $x\left(t\right)$ is decomposed into $N$ IMFs, each of which has a different resolution. The original signal $x\left(t\right)$ equals to the summation of the extracted IMFs of different scales and the residual signal:
where $N$ is the number of extracted IMFs, $j$ is the scale label of a IMF, ${r}_{N}\left(t\right)$ is the final residue.
2.2. Time domain feature extraction
Time domain features which include more information can reflect the basic characteristics of the signals. Time domain features are extracted to diagnose the failure status such as root mean square (RMS), maximum value, standard deviation, kurtosis, root amplitude and peaktopeak value [9]. Maximum value and root mean square are extracted from the first IMF; standard deviation and kurtosis are extracted from the second IMF; root amplitude and peaktopeak value are extracted from the third IMF. The following table lists the formula of the extracted time domain.
2.3. Weakly labeled support vector machine (WELLSVM)
We commence the classification method from SVM. The basic task of SVM is to estimate a classification function $f:{R}^{N}\to \{$±1$\}$ using inputoutput training data from two classes [10]. The hyperplane equation of the train set is:
Table 1. The basic formula of time domain features
Time domain feature

Equations

Time domain feature

Equations

Maximum value

${X}_{max}=\mathrm{m}\mathrm{a}\mathrm{x}\left({x}_{i}\left(t\right)\right)$

Root mean square

${X}_{rms}=\sqrt{\frac{1}{N}\sum _{i=1}^{N}{x}_{i}^{2}}$

Standard deviation

$\sigma =\sqrt{\frac{1}{N}{\sum _{i=1}^{N}\left({x}_{i}\mu \right)}^{2}}$

Kurtosis

$\beta =\frac{1}{N}\sum _{i=1}^{N}{x}_{i}^{4}$

Root amplitude

${X}_{r}={\left(\sum _{i=1}^{N}\sqrt{\leftx\right}\right)}^{2}$

Peaktopeak value

$\mathrm{m}\mathrm{a}\mathrm{x}\left({x}_{i}\left(t\right)\right)\mathrm{m}\mathrm{i}\mathrm{n}\left({x}_{i}\left(t\right)\right)$

The basic idea of the method is to look for the largest interval separating plane (shows in Fig. 1) in the case of misclassification which corresponds to the following optimization problem:
$s.t.\mathrm{}\mathrm{}\mathrm{}\mathrm{}\mathrm{}\mathrm{}{\widehat{y}}_{i}\left[{w}^{T}{x}_{i}+b\right]\ge 1{\xi}_{i},{\xi}_{i}\ge 0,\mathrm{}\mathrm{}\mathrm{}i=1,\mathrm{}2,\dots ,\mathrm{}N,$
where $\xi =\left({\xi}_{1},\mathrm{}{\xi}_{2},\mathrm{}\dots ,{\xi}_{N}\right)$, $C>0$ is a fixed penalty parameter.
Fig. 1. The largest interval separating plane of SVM
Eq. (5) can be written as:
Interchanging the order of $\underset{\alpha \in \mathrm{{\rm A}}}{\mathrm{m}\mathrm{a}\mathrm{x}}$ and $\underset{\widehat{y}\in \beta}{\mathrm{m}\mathrm{i}\mathrm{n}}$ in Eq. (7), we obtain the proposed WELLSVM:
We rewritten the objective of WELLSVM as the following optimization problem:
where $\mu $ is the vector of ${\mu}_{t}$’s, ${\rm M}$ is the simplex $\left\{\mu \left{\sum}_{t}{\mu}_{t}=1,\mathrm{}\mathrm{}{\mu}_{t}\ge 0\right.\right\}$, and ${\widehat{y}}_{t}\in \beta $.
In semisupervised learning, not all the training labels are known. Let ${D}_{L}={\left\{{x}_{i},{y}_{i}\right\}}_{i=1}^{l}$ and ${D}_{U}={\left\{{x}_{j}\right\}}_{j=l+1}^{N}$ be the sets of labeled and unlabeled examples, and $\widehat{y}={\left[{\widehat{y}}_{1},\mathrm{}{\widehat{y}}_{2},\dots ,{\widehat{y}}_{N}\right]}^{\mathrm{\text{'}}}$ is the vector of learned labels on both labeled and unlabeled examples, ${y}_{L}={\left[{y}_{1},\mathrm{}{y}_{2},\dots ,{y}_{l}\right]}^{\mathrm{\text{'}}}$, and ${\widehat{y}}_{U}={\left[{\widehat{y}}_{l+1},\mathrm{}{\widehat{y}}_{l+2},\dots ,{\widehat{y}}_{N}\right]}^{\mathrm{\text{'}}}$ [11]. Then the Eq. (5) leads to:
$s.t.\mathrm{}\mathrm{}\mathrm{}\mathrm{}\mathrm{}\mathrm{}{\widehat{y}}_{i}\left[{w}^{T}{x}_{i}+b\right]\ge 1{\xi}_{i},{\xi}_{i}\ge 0,\mathrm{}\mathrm{}\mathrm{}i=1,\mathrm{}2,\dots ,N,$
where $\beta =\left\{\widehat{y}=\left\widehat{y}=\left[{\widehat{y}}_{L};{\widehat{y}}_{U}\right],\mathrm{}\mathrm{}\mathrm{}\right.{\widehat{y}}_{L}={y}_{L},\mathrm{}\mathrm{}{\widehat{y}}_{U}\in {\left\{\pm 1\right\}}^{N1}\right\}$, and ${C}_{1}$, ${C}_{2}$ balance the 2 types of Hinge Loss function:
$s.t.\mathrm{}\mathrm{}\mathrm{}\mathrm{}\mathrm{}\mathrm{}{\widehat{y}}_{i}\left[{w}^{T}{x}_{i}+b\right]\ge 1{\xi}_{i},{\xi}_{i}\ge 0,\mathrm{}\mathrm{}\mathrm{}i=1,\mathrm{}2,\dots ,N.$
We iterate the following two steps until convergence to solve Eq. (10) by:
1) Fix the mixing coefficients $\mu $ of the base kernel matrices.
2) Fix ${w}_{t}$’s and update $\mu $ in closedform.
3. Experimental verification
This section is devoted to show the reliability of the WELLSVM model for fault diagnosis of rolling bearings. Experiment data in different working conditions are chosen to validate the effectiveness of the proposed method.
3.1. Experiment setup
Bearing data from the bearing data center of Case Western Reserve University were used for testing and verification in the experiment. The bearing testrig contained a 2 horsepower motor which was used as the prime mover to drive a shaft coupled with a bearing housing as shown in Fig. 2. The testrig included both drive end (DE) and fan end (FE) bearings of 62052RS JEM SKF, of which the vibration data were collected by using accelerometers attached to the housing with magnetic bases. Accelerometers were placed at the 3 clock position for both the DE and FE bearings. For data acquisition, digital data was collected at 12,000 samples per second while a sampling rate of 1.2 kHz was used for DE and FE bearing faults.
3.2. Experiment execution
This section we executed the experiment content. Firstly, we did empirical mode decomposition (EMD) to the original signal, the first three IMFs were chosen and then we extracted two different time domain features from each IMF of the used bearing data. We obtained six dimension features from the extracted time domain features. At last, the WELLSVM was employed to classify the four failure mode.
Fig. 2. Bearing testrig for the experiment
Firstly, we clustered inner ring failure, outer ring failure and rolling element failure, and classified them with the normal condition. Then we clustered outer failure and rolling element failure together, and classified them with inner ring failure. After that WELLSVM was used to classify outer ring failure and rolling element failure (shows in Fig. 3). After data processing, for each data set of the mode, 75 % of the examples were randomly chosen for training, and the rest for testing. We investigated the performance of each approach with varying amount of labeled data (namely, 5 %, 10 %, 15 % and 20 % of all the labeled data). The whole setup was repeated 10 times and the average accuracies on the test set are reported in Table 2.
Fig. 3. Multiclassification of WELLSVM
Table 2. Average accuracies of fault classification
Classification pattern

Normal with other failure mode

Outer ring with inner ring and rolling element

Inner ring and rolling element

Average accuracies with 5 % labeled

99.24

98.3573

97.728

Average accuracies with 10 % labeled

99.4336

98.872

98.072

Average accuracies with 15 % labeled

99.6608

99.352

98.272

Average accuracies with 20 % labeled

99.6896

99.1093

99.268

3.3. Result and comparison
In Table 2, it is shown that the classification accuracy of 5 %, 10 %, 15 % and 20 % of all the labeled data are all exceeded 95 %, which indicates that the proposed method WELLSVM can separate the normal mode, inner ring failure, outer ring failure and rolling element failure commendably. What’s more, the more labeled examples for training set, the more effective of the result is.
4. Conclusions
In this study, a method of fault diagnosis for rolling bearing is proposed. EMD was utilized as a powerful signal decomposition method for any complicated signal. Because the time domain features can represent the essential characteristics of vibration signals, we extracted the time domain features. At last, WELLSVM was employed as a powerful signal processing method for classification to classify the data of all failure mode which was extracted features from the IMFs. The experiment indicates that WELLSVM can effectively classify fault rolling bearing.
Our future works will focus on the following aspects, firstly, more attempts used WELLSVM will be made to other objects except for rolling bearing. Secondly, we will try to use WELLSVM in another fields except classification to extend the universality of the proposed method.
Acknowledgements
This study is supported by the National Natural Science Foundation of China (Grant Nos. 61074083, 50705005, and 51105019) and by the Technology Foundation Program of National Defense (Grant No. Z132013B002).
References
 Lou Xinsheng, Loparo Kenneth A. Bearing fault diagnosis based on wavelet transform and fuzzy inference. Mechanical Systems and Signal Processing, Vol. 18, Issue 5, 2004, p. 10771095. [Search CrossRef]
 Kankar P. K., Sharma Satish C., Harsha S. P. Rolling element bearing fault diagnosis using wavelet transform. Neurocomputing, Vol. 74, Issue 10, 2011, p. 16381645. [Search CrossRef]
 Pandya D. H., Upadhyay S. H., Harsha S. P. Fault diagnosis of rolling element bearing with intrinsic mode function of acoustic emission data using APFKNN. Expert Systems with Applications, Vol. 40, Issue 10, 2013, p. 41374145. [Search CrossRef]
 FernándezFrancos Diego, MartínezRego David, FontenlaRomero Oscar, AlonsoBetanzos Amparo Automatic bearing fault diagnosis based on oneclass vSVM. Computers and Industrial Engineering, Vol. 64, Issue 1, 2013, p. 357365. [Search CrossRef]
 Ying Zhao Research on Semisupervised Support Vector Machine Learning Algorithms. Ph.D. Eng. Theses. [Search CrossRef]
 Li YuFeng, Tsang Ivor W., Kwok James T., Zhou ZhiHua Convex and scalable weakly labeled SVMs. Journal of Machine Learning Research, Vol. 14, Issue 1, 2013, p. 21512188. [Search CrossRef]
 Chapelle Olivier, Sindhwani Vikas, Keerthi Sathiya S. Optimization techniques for semisupervised support vector machines. Journal of Machine Learning Research, Vol. 9, 2008, p. 203233. [Search CrossRef]
 Zhao ShuanFeng, Liang Lin, Xu GuangHua, Wang Jing, Zhang WenMing Quantitative diagnosis of a spalllike fault of a rolling element bearing by empirical mode decomposition and the approximate entropy method. Mechanical Systems and Signal Processing, Vol. 40, Issue 1, 2013, p. 154177. [Search CrossRef]
 Lee HongHee, Nguyen NgocTu, Kwon JeongMin Bearing diagnosis using timedomain features and decision tree. International Conference on Intelligent Computing, Vol. 4682, 2007, p. 952960. [Search CrossRef]
 Bennett Kristin P., Demiriz Ayhan Semisupervised support vector machine. Proceedings of the Conference on Advances in Neural Information, 1998, p. 368374. [Search CrossRef]
 Li YuFeng, Tsang Ivor W., Kwok James T., Zhou ZhiHua Convex and scalable weakly labeled SVMs. Journal of Machine Learning Research, Vol. 14, Issue 1, 2013, p. 21512188. [Search CrossRef]