Local coordinate weight reconstruction for rolling bearing fault diagnosis

The high dimensionality data originating from rolling bearing measuring signals with non-linearity and low signal to noise ratio often contains too much disturbance like interference and redundancy for accurate condition identification. A novel manifold learning named Local coordinate weight reconstruction (LCWR) is proposed to remove such disturbance. Due to the different contribution of samples to their manifold structure, weight value is used for the contribution difference. By reconstructing local low-dimensional coordinates according to its weight function about geodesic distance in neighborhood, LCWR targets to reduce reconstruction error, preserve intrinsic structure of the high dimensionality data, eradicate disturbance and extract sensitive features as global low-dimensional coordinates. The experimental results show that the intraclass aggregation and interclass differences of global low-dimensional coordinates extracted via LCWR are better than those of local tangent space alignment (LTSA), locally linear embedding (LLE) and principal component analysis (PCA). The accuracy reaches the highest 96.43 % using SVM to identify LCWR based global low-dimensional coordinates, and its effectiveness is testified in diagnosis of rolling bearing fault.


Introduction
Rolling bearing plays a key role in rotating machinery. It is necessary to monitor rolling bearing condition and identify its fault to avoid accident [1][2][3]. The state information of rolling bearing is usually described with high dimensionality data consisting of multiple characteristics in time and frequency domain [4][5][6], which contains redundancy and interference, and exists nonlinearity. Therefore, several works have been explored to remove such disturbance and obtain low-dimensional sensitivity features for better accuracy and efficiency of rolling bearing fault [7,8].
As a type of classic manifold learning for dimensionality reduction [9,10], local tangent space alignment (LTSA) completing nonlinear dimensionality reduction through finding out neighborhoods of high dimensionality samples, carrying out local dimensionality reduction, and realigning all neighborhoods' low-dimensional coordinates to construct global low-dimensional coordinates [11], recently has been used for fault diagnosis besides its earlier successful applications to image processing, data mining, machine learning, etc. [12,13]. For instance, Zhang et al. proposed supervised locally tangent space alignment (S-LTSA) to optimize the neighborhood selection of LTSA based on the training samples' categories, so that the neighborhood includes the same samples as possible to accurately reflect the local structures of different types of bearing fault signals [14]. Li et al. improved the accuracy of bearing fault diagnosis using LLTSA dimensionality reduction [15]. Kumar A and Kumar R utilized Linear Local Tangent Space Alignment (LLTSA) to suppress noise and retain characteristic defect frequencies of rolling bearing with inner and ball fault [16]. Su et al. proposed orthogonal supervised linear local tangent space alignment (OSLLTSA) to make the neighborhood selection of LLTSA better by introducing sample's label information, which removed interference and redundancy in high dimensionality fault data and extracted low-dimensional sensitivity fault features [17]. Wang et al. proposed supervised incremental local tangent space alignment (SILTSA) through embedding supervised learning into the incremental local tangent space alignment to extract bearing fault characteristics, process new samples and classify [18]. Su et al. stated supervised extended local tangent space alignment (SE-LTSA) to enhance intraclass aggregation and interclass differences of nonlinear high dimensionality samples by defining distance between samples and optimizing neighborhood choice based on class label [19].
In summary, the current interests focused on neighborhood optimization options and local tangent space estimation of LTSA for dimensionality reduction and low-dimensional sensitivity fault characteristics extraction.
Different from above methods, local coordinate weight reconstruction (LCWR) manifold learning is proposed to reconstruct local coordinates by weight coefficient, so as to extract global low-dimensional sensitivity features for improving fault diagnosis capability of rolling bearing. The following is as below: Section 2 proposes LCWR for coordinate reconstruction, Section 3 validates LCWR and Section 4 is conclusions.

LCWR
LCWR has two major tasks. First, the projection coordinates of k nearest neighbors of each sample on the tangent space of the neighborhood are calculated to build the local low-dimensional manifold using LTSA. Next, the global low-dimensional coordinates are obtained by aligning the low-dimensional coordinates of all neighborhoods according to weight coefficient, which is a main innovation of LCWR.

Local coordinate computation
Let sample matrix = , , … , ∈ × , is the number of sample dimensions and is the number of samples. Sample and its nearest samples (including ) constitute local neighborhood matrix = ( , , … , ) ∈ × , each neighborhood exists a local tangent space ∈ × ( < ) consisting of standard orthogonal basis vectors, and = ( , , … , ) ∈ × is the projection of onto . For the main geometric structure information within neighborhood, minimize the sum of square of distance between and : where = 1⁄ is a mean matrix of , is a unit column matrix of length , and is a by unit matrix.
Apply singular value decomposition to − : , , are computed as below: where ∈ × is a left singular vector set, ∈ × is a right singular vector set, ∑ ∈ × is a singular value diagonal matrix, ∑ ∈ × is a diagonal matrix with maximum singular values in descending order, ∈ × is the corresponding left singular vector set and ∈ × is right singular vector set. Thus, holds the most important geometric structure information in .

Global coordinate construction
Suppose = ( , , … , ) ∈ × is the local low-dimensional coordinate of . Build the affine transformation between and : The local reconstruction error is written as: where ∈ × is a local affine matrix, = 1⁄ is a mean matrix of , ∈ × is a local reconstruction error matrix.
However, owing to the different contribution of samples to their manifold structure, a novel LCWR based on weight coefficient is proposed to reduce permutation error and reconstruct local coordinates more accurately. According to LCWR, the closer a sample is to its manifold, the larger its weight coefficient is. Likewise, the farther a sample is from its manifold, the smaller its weight coefficient is. So an exponential function of geodesic distance between a sample and the center point of its neighborhood, reflecting the proximity to its manifold structure, is defined as weight coefficient, namely: where denotes the weight coefficients of the th nearest neighbor in , and denote the geodesic distance from to the center of and the mean square error of respectively, and is adjustment parameter.
Then is rewritten as: ) ∈ × . Fix and minimize to preserve as much local information as possible, namely: where ∈ × , = ∈ × . Substitute Eq. (9) into Eq. (7) and obtain : Minimize the sum of all neighborhoods reconstruction errors to obtain the global lowdimensional coordinate : It is equal to solve differential equation: where = = ∑ ∈ × . Therefore, the optimal solution of is composed of eigenvectors corresponding to the 2nd to the ( + 1)th smallest eigenvalues of .
(2) Extract local coordinates. Projection matrix and local low-dimensional coordinate matrix of each neighborhood are obtained according to Eq. (2) and Eq. (3), respectively. (3) Construct global coordinates. Global low-dimensional coordinate matrix is conducted from reconstruction by weight coefficient matrix and expressed as Eq. (14). The flow chart of LCWR is shown in Fig. 1.

Verification and analysis
Experimental data is from the bearing data center of Case Western Reserve University.

High-dimensional feature construction
As shown in Table 1, 12 time-domain statistical indicators and 8 frequency-domain statistical indicators are selected to constitute a 20-dimensional sample to characterize the bearing state. An original sample signal is decomposed into eight sub-band signals by three-layer db8 wavelet packet decomposition, and the ratio of the energy of each sub-band to the total energy of all sub-bands is taken as the frequency domain indicator. That is, = ⁄ , = ∑ , is the energy of sub-band signals. Thus, a high-dimensional feature matrix ∈ × is created.  Root mean square 10 Median 11 Mean 12 Crest factor 13-20 Energy ratio

Low-dimensional feature extraction
According to Fig. 2, some of the low-dimensional feature samples extracted by LCWR are used as training samples to train support vector machine (SVM) while the others as test samples to be recognized by trained SVM. When using LCWR to extract bearing state characteristics, three parameters such as neighbor number , dimension and adjustment parameter need to be optimized. Because the recognition rate can be regarded as a function of three parameters , and , these parameters interact to determine the recognition rate. By changing these parameters in a certain range and the corresponding recognition rate obtained, it is proved that there exist optimization values of parameters with the peak recognition rate. The trend of recognition rate with respect to a single parameter variable while the other two parameters fixed is shown in Fig. 3-5, respectively. Because of different parameters, the trend of recognition rate is also different from each other.
From the recognition rate about the nearest neighbor number in Fig. 3, there is the maximum rate 96.43 % at = 8. The role of on recognition rate is carried out by influencing the intrinsic geometry structure of high-dimensional samples, the close relationship between similar samples and the nonlinear dimensionality reduction ability of LCWR. If is too small, LCWR can't maintain the intrinsic geometry of high-dimensional samples and close association between similar samples. If is too large, it weakens the nonlinear dimensionality reduction capability of LCWR. Hence, the low-dimensional manifold structure hiding in the high-dimensional samples can be found to the greatest extent and achieve the maximum rate at the optimum value = 8. However, due to the comprehensive effect of different factors affected by , the recognition rate fluctuates and there are multiple turnover points rather than a monotonous trend.
Dimension affects the recognition rate by mining the sensitive features of high-dimensional samples in the neighborhood and eliminating redundant and interference components. From  Fig. 4, it can be seen that the maximum recognition rate is 96.43 % at = 3, because an appropriate makes similar samples have approximate low-dimensional features, leading to better clustering effect and improvement in recognition rate. Otherwise, LCWR can't fully mine the sensitive features from the high-dimensional samples in neighborhood if is too small or the low-dimensional features contain redundancy and interference if is too large. Likewise, due to different factors, the recognition rate has multiple turnover points.
Adjustment parameter affects the recognition rate by changing the degree of clustering and global geometry retention. In relationship between and recognition rate in Fig. 5, if is too small, the proximity of samples is low and the clustering is obvious, but the degree of retention of the global geometric structure is poor. If is too large, the global geometric structure can be improved but the clustering reduced. These factors cause reduction in recognition rate. As a result, there is an optimal = 0.1 where the recognition rate reaches the maximum 96.43 %.

Dimensionality reduction effect analysis
LCWR is compared with LTSA, locally linear embedding (LLE) and principal component analysis (PCA) to verify its dimensionality reduction effect. The dimensionality reduction results of LTSA, LLE and PCA are shown in Fig. 7, Fig. 8, and Fig. 9, respectively. Generally speaking, the reduced dimensionality samples via these methods have different degrees of intersection and overlap, poor clustering within class, lack of clustering centers. It is difficult to mine the essential characteristics of the bearing state and the differences between classes. Although LTSA and LLE find the manifold structure of high-dimensional samples, they are unable to expand the gaps between dissimilar samples in neighborhood. PCA belongs to one of linear statistical distributions without considering the local structure of the samples. It makes the intraclass aggregation poor and the differences between classes unclear, which fails to reveal the non-linear manifold structure of the bearing state as shown in Fig. 9. Combining its weight coefficient with local coordinate's permutation, LCWR enhances the intraclass aggregation and the differences between classes, overcomes the shortcomings that LTSA and LLE can't enlarge the gaps between dissimilar samples in neighborhood, simplifies the dimension while retaining the low-dimensional principal characteristics of high-dimensional samples, accurately reflects the relationship between signal characteristics and the bearing state, and effectively distinguishes four kinds of bearing states. As shown in Table 2, the features extracted by various methods are sent to SVM and the recognition rate of LCWR reaches the highest 96.43 % despite a little more time to run LCWR than LTSA, LLE and PCA as shown in Table 3. Therefore, in contrast to other dimensionality reduction methods, LCWR can achieve higher accuracy and prove its effectiveness.   Besides, it can be founded that recognition rate of manifold dimensionality reduction using LCWR, LTSA, LLE and PCA (all greater than 90 %) is higher than that of the non-dimensionality reduction method (only 84.72 %). It is further proved that these manifold learning methods can filter redundancy and interference of the high-dimensional features and extract the intrinsic low-dimensional manifold characteristics, which can significantly improve the recognition rate of the bearing state as shown in Table 2. Meanwhile, these manifold learning methods except LCWR consume less time and get better recognition efficiency.

Conclusions
LCWR manifold learning is proposed to remove redundancy and noise in bearing high-dimensional fault features and perform non-linear dimensionality reduction for improvement in fault diagnosis capability. Geodesic distance based weight function is used to realign local coordinates to eliminate redundancy and interference in high-dimensional feature samples and extract low-dimensional sensitive fault features. Experiments demonstrate that the intrinsic manifold structure of high-dimensional feature samples can be well preserved after dimensionality reduction by LCWR, and the extracted low-dimensional feature samples can truly represent the non-linear characteristics of different bearing states and the gaps between them. The low-dimensional feature samples are then identified by SVM, which results in a higher recognition rate than other methods. Thus, the effectiveness of LCWR is validated. In addition, LCWR is worth further studying to save running time.