Yue Feng Chen^{2} , Peng Hou^{3} , Xiao Jian Yi^{1}
^{2}The Fourth Research Laboratory, Beijing Special Vehicle Institute, Beijing, China
^{3}School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, China
^{1}Department of Overall Technology, China North Vehicle Research Institute, Beijing, China
^{3}Corresponding author
Vibroengineering PROCEDIA, Vol. 14, 2017, p. 6469.
https://doi.org/10.21595/vp.2017.19153
Received 10 September 2017; accepted 17 September 2017; published 21 October 2017
Copyright © 2017  JVE International Ltd.
The development of machine learning brings a new way for diagnosing the fault of rolling element bearings. However, the method in machine learning with high accuracy often has the poor ability of generalization due to the overuse of feature engineering. To address this challenge, Naïve Bayes classifier is applied in this paper. As the one of the cluster of Bayes classifiers, its ability of classification is very outstanding. In this paper, the method is provided with a detailed description for why and how to diagnose the fault of bearing. Finally, an evaluation of the performance of Naïve Bayes classifier is presented with real world data. The evaluation indicates that Naïve Bayes classifier can achieve a high level of accuracy without any feature engineering.
Keywords: Naïve Bayes classifier, machine learning, fault diagnosis, rolling element bearing.
Signal based methods are proved to be effective in the study on the fault diagnosis of bearings. In signal based methods, fault feature extraction, the key in the process of the fault diagnosis, determines the quality of the diagnosis. Although the Fourier analysis can efficiently separate the vibration signals in frequency domain for fault feature extraction, the lack of information on fault feature in time domain through the Fourier analysis is inevitable. Moreover, the fault features extracted by the Fourier analysis are helpless for nonstationary signal. Fortunately, the fault features extracted include information on both time domain and frequency domain with the emergence of wavelet analysis and Empirical Mode Decomposition, which is very helpful for the diagnosis of nonstationary signal.
Compared with signal based methods, the fault diagnosis methods, adopting machine learning, are more competitive. Lots of researchers and research results support the application of machine learning in fault diagnosis with a solid theoretical foundation. Besides that, the features extracted by machine learning from data are more objective than signal based methods. Moreover, the criterion based on the accuracy of fault diagnosis is more helpful for researchers and engineers in the choice of fault diagnosis methods.
With the development of the study on machine learning, knowledge based method has attracted increasing attention again. Many methods in machine learning are used in the fault diagnosis of bearing. For example, Hidden Markov Model [1], Artificial Neural Network [2] and support vector machine [3]. Even more, some methods in deep learning are also adopted to diagnose the fault of bearings [4]. It is important that all the methods above appear superior in performance [5].
It is worth noting that the features extracted by machine learning from data are more objective than signal based methods [6]. In fact, faults feature extraction is an important step in the process of fault diagnosis. Signal separation need to be executed in faults feature extraction. Due to the signal includes the information about the system monitored, the knowledge of the system can be acquired through the signal analysis. Signal separation is an important method in signal analysis, and it maps the signal into a group of significant basis, which show the components of the signal effectively. However, the definition of the significant basis is not clear, which generates different results using signal separation [7]. Compared with signal separation, the methods in machine learning extract the fault feature automatically in a unified standard. Therefore, the feature extracted by machine learning is objective.
The accuracy of fault diagnosis is a desirable characteristic for diagnosis systems [8]. All the desirable characteristics are standards or criterions for practitioners to evaluate and choose the different diagnosis methods. However, the accuracy of fault diagnosis is rarely mentioned in traditional method such as the Fourier analysis and wavelet analysis. In the study of machine learning, accuracy is the primary indicator of the performance of methods. Therefore, the accuracy of fault diagnosis can provide a nice guideline for the choose of diagnosis methods.
The fault diagnosis systems of bearings often require a much higher accuracy rate due to the importance of the bearings. Bearings are the frequently used units, and its failure often brings about the fatal breakdown of system [9]. More importantly, the failure caused by the fault bearings may bring about the loss of production and human casualties.
In order to improve the accuracy of diagnosis system, researchers try many methods in machine learning. However, a dilemma is exposed to practitioners. The method with high accuracy often has the poor ability of generalization [10]. The reason for the dilemma is that the overuse of feature engineering restricts the ability of generalization, although the accuracy is improved by the feature engineering. Therefore, it is necessary to find a method in machine learning to achieve the high accuracy without any or less feature engineering [11].
To address this challenge, the Naïve Bayes Classifier is proposed. The performance of the Naïve Bayes Classifier is evaluated by the vibration data of bearing that collected in realworld. The result of the evaluation demonstrates that the diagnosis performance of Naïve Bayes Classifier without feature engineering outperforms many classification technologies using feature engineering.
The theoretical foundation of Naïve Bayes classifier is Bayesian decision theory. It considers the fault diagnosis as a sequence classification in machine learning. Based on that, vibration signal is classified into different categories in sequence classification, then the fault diagnosis of bearings is finished through the category of the vibration signal.
The theoretical foundation of Naïve Bayes classifier is Bayesian decision theory [10]. It thinks how to label samples optimally using posterior probability and mislabel loss. In fact, posterior probability and mislabel loss are the key elements in Bayesian decision theory. Posterior probability $P\left({C}_{i}\mathbf{x}\right)$ defines the probability that sample $s$ with the attributes $\mathbf{x}=({x}_{1},{x}_{2},\dots ,{x}_{n})$ belongs to category ${C}_{i}\in \mathbf{C}=({C}_{1},{C}_{2},\dots ,{C}_{n})$. Mislabel loss ${\lambda}_{i,j}$ describes the loss produced by the mislabel of the sample with label ${C}_{i}$ to the ${C}_{j}$. The conditional risk, composed of Posterior probability $P\left({C}_{i}\mathbf{x}\right)$ and Mislabel loss ${\lambda}_{i,j}$, is defined as $R\left({C}_{i}\mathbf{x}\right)=\sum _{j=1}^{N}{\lambda}_{i,j}P\left({C}_{j}\right\mathbf{x})$. The conditional risk defines the loss produced by the classification of the sample s with the attributes $\mathbf{x}$ to category ${C}_{i}$.
In the training step, the objective is to minimize the sum of the conditional risk for every sample in the data set, which results in the hypothesis $h$ determines $\mathbf{x}$ belongs to which category. In fact, many methods in machine learning have the common objective in training phrase, then they achieve the $h$ through learning in the data set $\mathbf{D}=\{{s}_{1},{s}_{2},\dots ,{s}_{m}\}$. The optimal hypothesis ${h}^{*}$ is the one that minimizes the sum of the conditional risk for every sample. In Bayesian decision theory, the sum of the conditional risk for every sample in the data set is called the total risk $R\left(h\right)$. However, there is no need to minimize the total risk $R\left(h\right)$. According to the Bayes decision rule, the only thing needed is to minimize the conditional risk $R\left({C}_{i}\mathbf{x}\right)$ through choosing the optimal label for $\mathbf{x}$ to achieve the objective in training phrase, which can be represented as the formula in Eq. (1). In most literatures, ${h}^{*}$ is also called the optimal Bayes classifier:
Naïve Bayes Classifier is the extreme one in the cluster of Bayes classifier. In general, the mislabel loss ${\lambda}_{i,j}$ in Naïve Bayes classifier is defined in Eq. (2):
The aim for the definition of the mislabel loss in Naïve Bayes classifier is to minimize the error of the classification. Therefore, the Naïve Bayes classifier can be represented as:
According to Bayes theorem, $P\left(C\right\mathbf{x})$ can be represented as:
$P\left(\mathbf{x}\right)$ in Eq. (4) is called evidence, which is independent of the label. Therefore, Eq. (3) can be formulated as follow:
In fact, the difficulty in the training phrase is to solve the $P\left(\mathbf{x}\rightC)$. However, Naive Bayes classifier adopts the attribute conditional independence assumption to overcome the difficulty. Based on the assumption, $P\left(\mathbf{x}\rightC)$ can be represented as follow:
Then, the formula of Naive Bayes classifier in Eq. (3) can be represented as:
Obviously, the assumption, attribute conditional independence assumption, makes the calculation easy to solve. However, compared with other classifier in the cluster of Bayes classifier, the simplification of $P\left(\mathbf{x}\rightC)$ is extreme.
Sequence classification is a typical problem in machine learning [12]. The sequence classified is often defined as an order list of event. The event may be a symbol, a real number. At present, the sequence in sequence classification is a time series. Every sequence associates a label, and every sequence have a fixed length in order to be available to a specified classifier [13]. Therefore, the fault diagnosis of bearings using vibration signal can be categorized to sequence classification through Naïve Bayes classifier.
The procedure of fault diagnosis using Naïve Bayes classifier is the same with the procedure of classification in machine learning. There are three steps: data clean, training classifier and classification.
The main aim of data clean is to align the sequence that needed to be classified in the data set. In fact, the performance of various classifier in machine learning is affected by the quality of the data set, however, data clean is main approach to improve the quality of the data set. There are some manipulations in data clean such as value transform, type conversion and invalid data deletion [14]. All the manipulations can improve the quality of data set effectively. In sequence classification, align the sequence is the most important thing for data clean. Because any classifier is available to the specified length sequence, it is necessary for classifiers to tailor the sequence.
Through the step of data clean, the data set $D$ is represented as $D=\{\left({\mathbf{x}}_{1},{y}_{1}\right),\left({\mathbf{x}}_{2},{y}_{2}\right),\cdots ,\left({\mathbf{x}}_{N},{y}_{N}\right)\}$. Each sample in $D$ consists of the sequence and the label. The sequence ${\mathbf{x}}_{i}=({x}_{i}^{1},{x}_{i}^{2},\dots ,{x}_{i}^{M})$ is a time series has the length $M$, which is the vibration signal. The label ${y}_{i}$ is the system state or the failure mode, and ${y}_{i}\in \{{C}_{1},{C}_{2},\dots ,{C}_{k}\}$.
The first step in training phrase is to calculate the $P\left({C}_{k}\right)$. The Eq. (8) is used to calculate it:
The $\mathbf{I}(\bullet )$ is the indicator function. When $\mathbf{I}\left(True\right)$, it equals to 1, otherwise, it equals to 0.
In the second step of training phrase, suppose that the $k$th element in the attribute vector ${\mathbf{x}}_{\mathbf{i}}=({x}_{i}^{1},{x}_{i}^{2},\dots ,{x}_{i}^{M})$ is ${x}^{j}$, which has $l$ possible values denoted by ${a}_{j1},{a}_{j2},\dots ,{a}_{jl}$. The possibility of the attribute ${x}^{j}$ equals to the value ${a}_{jl}$ is calculated by Eq. (9):
Up to now, the training of classifier is finished through all the two steps.
After the training, the Naïve Bayes classifier can be used to classify. The procedure is described below.
Step 1: calculate the possibility that the sequence belongs to category ${C}_{k}$:
Step 2: identify the label $y$ of the sequence:
Finally, the result of diagnosis can be determined by the $y$.
In order to test the performance of Naïve Bayes classifier on the fault diagnosis of bearings, the vibration data from Case Western Reserve Lab (CWRU) [15] is chosen. This data set has been analyzed by a number of other researchers [1618], and those analysis results can be considered as a benchmark.
Fig. 1. The vibration separator
Vibration data was collected through accelerometers, and the accelerometers are placed at the 12 o’clock position at the drive end of the motor housing. Vibration signals are gathered through a 16 channel DAT recorder with sample rate 12 K/s and 48 K/s. the loads of Driver and bearing include four different types (0, 1, 2 and 3 hp), and there are four different fault types (Normal condition, ball fault, inner race fault and outer race fault). Every fault type has four fault conditions. The fault conditions are different in the fault diameters: 0.007 in, 0.014 in, 0.021 in and 0.028 in. Twenty data sets are used in this paper, which include two fault types: Normal condition and the inner race fault. All the data in this paper are sampled in rate 12 K/s. The data in Normal condition is made up of 480000 points, and all the other data have at most 125886 points.
All the twenty data sets are separated to form the new data set in a fixed length of 4096 in data clean, in order to align the sequence. Each sample in the new data set includes 4096 points, and the amounts of samples are 878. The label of each sample is equal to the data from which the sample separated. The label of the samples $y=\{0,1,2,3,4\}$, and the definition of the element is shown in Table 1. All of the experiments in this paper are done on lintel Core i76700 2.60 Hz with 32GB RAM.
Table 1. The fault type for labels
Serial number

Label

The fault types

1

0

Normal condition

2

1

The fault diameter is 0.007 in

3

2

The fault diameter is 0.014 in

4

3

The fault diameter is 0.021 in

5

4

The fault diameter is 0.028 in

The classification result is obtained by Naïve Bayes classifier in Table 2, which is followed the process of the fault diagnosis using Naïve Bayes classifier in Section 3. The accuracy is compared with three results that achieved through SVM and SVM with feature engineering.
Table 2. The accuracy of different methods
Algorithm

Naïve Bayes

SVM

SVM + fractal dimension

SVM + wavelet packet

Accuracy

96 %

72 %

80.6 % [11]

94 % [19]

In this paper, Naïve Bayes classifier is employed to diagnose the different fault conditions of bearings. From the experimental verification and the result of fault diagnosis application in Section 4, it can be draw that Naïve Bayes classifier is effective in diagnosing the fault of bearings through the vibration signals. Furthermore, the Naïve Bayes classifier achieves a better performance than other methods with feature engineering. The proposed method not only provides a desirable characteristic: accuracy, but also provides a strong ability of generalization. Therefore, it is helpful for practitioners on tradeoff between accuracy and generalization.