Recognition of rock – coal interface in top coal caving through tail beam vibrations by using stacked sparse autoencoders

This paper provides a novel rock-coal interface recognition method based on stacked sparse autoencoders (SSAE). Given their different size and hardness, coal and rock generate different tail beam vibrations. Therefore, the rock-coal interface in top coal caving can be identified using an acceleration sensor to measure such vibrations. The end of the hydraulic support beam is an ideal location for installing the sensor, as proven by many experiments. To improve recognition accuracy, the following steps are performed. First, ensemble empirical mode decomposition method (EEMD) is used to decompose the vibration signals of the tail beam into several intrinsic mode functions to complete feature extraction. Second, the features extracted are preprocessed as the inputs of SSAE. Third, a greedy, layer-wise approach is employed to pretrain the weights of the entire deep network. Finally, fine tuning is employed to search the global optima by simultaneously altering the parameters of all layers. Test results indicate that the average recognition accuracy of coal and rock is 98.79 % under ideal caving conditions. The superiority of the proposed method is verified by comparing its performance with those of four other algorithms.


Introduction
Coal is a highly important energy source, contributing approximately 30 % of the world's energy consumption and 64 % of that of China in 2015.In China, the storage capacity of thick coal seam accounts for up to 45 % of the total storage.Top coal caving mining technology is widely used to mine thick coal seam because of the approach's advantages of high production, high efficiency, low energy consumption, low cost, and strong adaptability.
However, a large problem exists in top coal caving with regard to heavy reliance on manual eyeballing for identifying the rock-coal interface.This approach easily causes owing and over caving.Owing caving results in a drop in recovery rate and resource waste amount, whereas over caving leads to decline in coal quality and rise in production cost.Over caving also brings great security risk to operators because of the extremely harsh mining environment and complex process of top coal caving.Therefore, achieving automation for top coal caving mining is an important goal.Such automation requires highly accurate and rapid recognition of coal and rocks.
In the past decades, researchers have proposed many methods for recognizing coal and rock.The most famous methods include γ-ray detection [1][2][3], radar detection [4,5], cutting force response [6,7], infrared detection [8], and image detection [9].These methods have acquired some valuable achievements and have been applied in several fields.However, such techniques present with disadvantages.Artificial rays are harmful to human beings, and the cost of natural rays is also relatively high.Meanwhile, radar detection faces the problem of considerable contradiction between measuring range and precision.Meanwhile, cutting force response method is unsuitable for the top coal caving mining environment, whereas infrared detection method is insufficiently mature.In addition, the recognition accuracy rate of the above-mentioned methods is highly sensitive to the coal mine environment.
A method based on vibration signal analysis has been investigated by an increasing number of researchers [10][11][12][13] in recent years and used in many practical applications.In [10], the measuring point of arm vibration was optimized.In [11], the power spectrum of coal and rock vibration models based on a time-series analysis was employed to identify coal and rock.In [12], the shear cutting state was diagnosed by an acceleration sensor that measures the vibration of the rock transmission part.In [13], the identification of coal and rock were depended on the vibration of the rear scraper conveyor and the hydraulic support tail beam in different conditions.In comprehensive coal caving mining, they hit the tail beam of the hydraulic supporter to cause the tail beam to vibrate when coal or rocks fall.Given the different size and hardness of coal and rock, the tail beam vibration produced was different.Therefore, we can identify the rock-coal interface by using acceleration sensors to measure tail beam vibration.
The essence of rock-coal interface detection is pattern recognition and classification.Numerous suitable methods for classification exist.These approaches include probabilistic neural network [12], back-propagation neural network [14], fuzzy logic [15], support vector machine (SVM) [8,16], and naive Bayes etc.The neural network is widely used and the most well-known among the techniques.In 1958, the neural network was first proposed [17]; since then, correlative methods have been developed; these methods include convolution autoencoders neural networks (CAE), back propagation (BP), convolution neural networks (CNN), deep belief networks (DBN), and stacked sparse autoencoder (SSAE) neural networks.These techniques differ in architecture, learning approach, and training method and hence present different performances in applications.Many successful engineering examples, such as weather prediction [18], facial expression recognition [19], fault diagnosis [20] and speech recognition [21], were developed.
The SSAE neural network is a highly effective method for classification and has been widely employed for classification and pattern recognition problems [22] since its proposal.A SSAE consists of several layers of sparse autoencoders (SAEs), in which the input of the next layer is the output of the previous layer.Greedy, layer-wise, SSAEs can pretrain the weights of the entire deep networks by training each layer.Numerous optimization algorithms were proposed in the progress of optimizing the parameters of neural networks, including genetic algorithm [7,23], conjugate gradient descent [24], BFGS [25], L-BFGS [26], and gradient descent.In this paper, gradient descent was selected because of its simplicity and practicality.
Feature extraction is an important aspect because the vibration signal of the tail beam is non-linear and non-stationary.Some typical methods, such as wavelet transform (WT) [27], empirical mode decomposition (EMD) [28], and the Fourier transform (FT) [29] are applicable.In this paper, we selected ensemble EMD (EEMD), which was proposed in 2004 [30].Successfully solved mode mixing, EEMD has been widely used in many industries, such as vibration analysis [31] and signal denoising [32].
The proposed method mainly included three parts.The first part is involved signal acquisition by using 4508 type acceleration sensor and 3560 B data acquisition front end to acquire the tail beam vibration signals produced by the rock and coal.The second part comprised feature extraction.EEMD algorithm was used to decompose the signals into intrinsic mode functions (IMFs) and the relatively effective components were singled out.The third part consisted of the recognition of the rock-coal interface using a SSAE neural network based on batch gradient descent algorithm to recognize of coal and rock.To establish the SSAE model, we conducted many experiments to obtain the optimal parameters for SSAE.

Materials and methods
For many years, neural networks have been widely used in classification and obtained favorable results.This paper proposed a novel network model named SSAE to recognize the rock-coal interface in top coal caving.The SSAE first initializes parameters and then uses feed-forward and back-propagation by batch gradient descent algorithm to seek the minimum of cost function for obtaining the global optical parameters.The progression is called pretraining, which involves unsupervised learning.Afterward, fine-tuning is employed to obtain better results.In fact, the SSAE holds a powerful expression and enjoys all the benefits of deep networks.Furthermore, the model can achieve hierarchical grouping or part-whole decomposition of the input.For example, if the input is a face image, the first layer may learn how to identify the boundaries, the second layer may learn how to combine these boundaries, and the higher layer may learn how to combine facial organs [19].

SSAE
A SAE neural network is an unsupervised learning algorithm that does not require labeled training examples unlike other neural networks.Applying back propagation sets the target values to be equal to the inputs.The schematic of a SAE with inputs and units in the hidden layer is shown in Fig. 1.

Fig. 1. Schematic of a SAE
In an autoencoder, the overall cost function is as follows: where ( ) is the raw input, is the number of inputs, ( ) is the raw output, ℎ , ( ( ) ) is the output of activation function, is the relative importance of the second term.
In a SAE, we can discover some correlations of input features by placing constraints on the network.Sparsity is imposed to constrain the hidden units as follows: where denotes the activation of hidden unit in the SAE when the network is given a specific ( ) .The parameter is the average activation of hidden unit .Moreover, we make the two parameters equal as follows: = .
( Typically, this parameter holds a very small value, usually 0.05.Thus, we need the activation of the hidden unit to be close to 0. We add a penalty term to penalize the situation of deviating significantly from to optimize the objective.The penalty term is as follows: Then, the overall cost function is: The detailed establishment of SSAE, pretraining, and fine-tuning can be carried out in the following steps (Fig. 2): Step (1): A SAE is trained on the raw input ( ) .for learning of primary features ℎ ( )( ) .The structure of the first SAE is [ , , ], corresponding to inputs, units in hidden layers, and outputs.
Step (2): This trained SAE is then adopted to obtain the primary feature activations ℎ ( )( ) for each input ( ) .

Fig. 2. Establishment of SSAE
Step (3): ℎ ( )( ) is then used as the "raw input" to the second SAE for learning the secondary features ℎ ( )( ) .The second SAE structure is described by [ , , ], corresponding to inputs, units in hidden layer, and outputs.
Step (5): Treat the secondary features as "raw inputs" of a sigmoid classifier to map digit labels.
Step (6): Combine the three layers and a classifier layer to form the SSAE model.
Step (7): Back-propagation is conducted to improve the results by adjusting the parameters of all layers simultaneously in a process called fine-tuning.
Step (8): Step ( 7) is performed repeatedly until the set training times are achieved.

Batch gradient descent
Batch gradient descent is an efficient algorithm for searching the optimal solution.One iteration of batch gradient descent is as follows: Step 1: The following assumptions are set: Δ ( ) : = 0, Δ ( ) : = 0 for all .where Δ ( ) is a matrix with the same dimension as ( ) and Δ ( ) is a vector with the same dimension as ( ) .
Step 3: ( ) and ( ) are updated as below: where is the weight decay parameter.The above-mentioned steps of batch gradient descent are repeated to reduce the cost function for training the neural network.

Recognition system for the rock-coal interface
The intelligent recognition of rock-coal interface system consists of signal acquisition, feature extraction, and rock-coal interface identification and prediction (Fig. 3).Top coal caving working site is shown in Fig. 4.

Acquisition of the vibration signal of tail beam
The vibration signal acquisition of a tail beam is the first step for identification of rock-coal interface.Obtaining the vibration signal by sensors is the most common method used for identification.In this study, an acceleration sensor (4508 type) was employed.Table 1 displays the acceleration sensor parameters, and Fig. 5 presents the vibration sensor.Detection of tail beam vibration can successful identify and differentiate coal and rock.The installation position of the sensor is important for collecting signals.After analyzing the actual working scene, we noted that the main impact locations of coal and rock are the beam and tail chute, respectively, of the hydraulic supporter.Accordingly, we set the sensors at these two places.After many experiments, we found that the sensor is easily buried by falling coal and stone when the sensor is installed in the tail chute.Moreover, the continuous spraying of water for dust-proofing may have damaged sensors during coal caving.The running conveyor chute also generated considerable interference during signal acquisition, thereby negatively affecting the next step of the analysis.Hence, the end of the hydraulic support beam is an ideal location for sensor installation.Then, we chose the tail beam back to acquire vibration signals.The final self-designed experimental system and specific installation location of the sensor are shown in Fig. 6(a) and 6(b), respectively.
The experimental data acquisition system also includes a portable vibration sound wave signal detector with a data acquisition front end, wireless routers, and software system.Vibration signals were obtained by the data acquisition front end, transmitted to the wireless broadband router by cables, and then sent to a notebook for data storage and analysis.Afterward, we exported the data from the software system.All the data were acquired in the No. 1306 working platform in the Xinglongzhuang coal mine of Yanzhou Mining Company.The working pattern involved one knife and one caving.Caving step distance determination was 0.8 m, and coal thickness was 7.34 m to 8.90 m.The frequency of sampling was set as 2560 Hz, and each pattern's sampling time was 160 ms.A total of 4096 sampling points were obtained for each pattern, and 4000 sample points were saved for each pattern to avoid possible errors of uncertainty.

Feature extraction
Features must be extracted from the data exported from the pulse to enable the use of pattern recognition training and rock-coal interface prediction.Obtaining the final recognition results is hence a highly important task.
Many signal processing approaches can obtain desirable features from primary vibration signals and widely used in many industries, such as the FT [27,28], WT [29], and EMD [30].However, the signal was submerged in a strong background noise and non-stationary signals because of difficulty extracting useful features.For example, a signal based on the local characteristic time scales could be decomposed into several IMFs by EMD.However, it exists the problem of mode mixing.Thus, EEMD [29] was presented to solve this problem.Compared with EMD, EEMD adds a certain amount of Gaussian white noise to the original signal to solve mode mixing each time before decomposing.EEMD is highly useful for non-stationary and non-linear signals.
The followings are the summary of EEMD (Fig. 7): Step 1: The standard deviation of the added noise and the ensemble number is initialized.The experiments show that the proposed model obtained better recognition when we set = 300 and = 0.2.
Step 2: Different white noises with different standard deviation are added to the original signal.
where ( ) signifies the added signal, ( ) represents the original signal, and ( ) denotes the nth added white noise.
Step 3: The added noise signal is decomposed by EMD into IMFs.
Step 5: Each IMF is calculated by the ensemble mean ( ) of the trials as follows:
But the IMFs obtained by EEMD are not always the intrinsic mode functions.This is because when the EEMD decomposed the signal, the addition of white noise leads to the high frequency components maybe actually white noise, but not the intrinsic mode functions and should be deleted.Significance test and orthogonality check are conducted, and the significances shows that the first four IMFs have no significance relationship with the results.We removed the first four IMFs and the correctness of it is proved by the experiments in Section 3.3.2.
In the following data processing, these remained IMFs were used as inputs for the proposed model in the working pattern recognition.Before importing these features to the SSAE model, we first conduct preprocessing.Such step must first be accomplished because the input vectors cannot exceed the defined scope of the hidden and output layers.Through this manner, we can full apply the value of each input and render the operation more convenient and fast.In fact, the normalized method is first used to limit the input absolute value to between 0 and 1, then the standard normal state method is adopted to transform the input into a standard normal distribution.
In these studies, 4000 groups of samples were obtained for each pattern and 8000 groups in total.In the following experiments, a 10-fold cross validation method was employed, and samples were randomly divided into 10 portions.Nine portions were utilized to train the proposed model, whereas 1 portion was used for testing.Each subsample was tested once, and the average of the results of the 10 experiments was regarded as the final test result.In each experiment, 7200 samples were adopted for training, and 800 samples were used for testing.

Construction of a SSAE model
To establish the proposed model, we carried out numerous experiments using the control variable method and determined basic parameters, such as learning rate and sparsity parameter.Parameters are obtained by comparing with different recognition accuracy rates of input features, classifiers, and structures.

Basic parameters
During the reduction of cost function by batch gradient descent, if the learning rate is excessively large, which results in an excessively large step size, gradient descent may overshoot the minimum and deviate increasingly further from the minimum.Setting the learning rate as exceedingly small is also unreasonable because such action can slow down the computing speed entailed to reduce the cost function.Another drawback is the potential trapping in the local optima and the potential resultant inability to reach the global optimal solution.In this proposed model, we set as equal to 1.25.By changing only one parameter and fixing the others, we can obtain the values of the other parameters, such as the number of samples in batch training ℎ , the sparsity parameter , and so on.In this study, we set the ℎ = 40, = 0.3, and the number of SAEs as 2.

Comparison among algorithms with different features
In this experiment, we set the first SAE structure as [ 300 ], where is the number of input features, varying from 1 to 11.The second SAE structure was [300 40 300]; the classifier was sigmoid, and the learning rate, sparsity parameter, and other parameters were set as mentioned above.
When is 1, each preprocessed IMF is separately used as the input feature of the proposed model.Table 2 shows the recognition accuracy results.It shows that the recognition accuracies of preprocessed IMF 2 and 4 are less than 50 %, played a negative role.But preprocessed IMF 11 has got a good recognition rate (78.60 %).= 2047 experiments, we sorted out the maximum recognition accuracy rates of each input features number and the corresponding combinations of preprocessed IMFs (Table 3).It shows that the recognition rate achieved an initial upward then downward trend.When the number of input features set to 7, corresponding to the preprocessed IMF from 5 to 11, the recognition rate reached the maximum value (98.96 %).So we removed the first four preprocessed IMFs and use the remaining 7 preprocessed IMFs as the input of the proposed model.

Comparison among algorithms with different hidden layer structures
In this experiment, we set the number of first autoencoder hidden units to vary from 50 to 1000 at intervals of 50.The number of the second autoencoder hidden units varied from 5 to 50 at intervals of 5. A total of 200 kinds of different structures were assessed to find the most suitable structure for the proposed model (Fig. 9).b) and (c) refer to the accuracy rates of coal, rock, and the average of coal and rock, respectively.Fig. 9(a) shows that when the number of hidden units of the first SAE was set to 150 and the number of hidden units of the second SAE was set to 40, the accuracy of the coal reached the maximum value (98.12 %), followed by the proposed method structure of [11,50,30,2] (97.95 %) and the structure of [11,200,50, 2] (97.85 %).Meanwhile, Fig. 9(b) reveals that the structure of [11,200,40,2] enabled the recognition accuracy of rock to reach the maximum (100 %), followed by the proposed method structure of [11,50,30,2] (99.98 %).Fig. 9(c) shows that the average identification of coal and rock achieved the maximum (98.96 %) when the structure was [11,50,30,2].

Comparison among algorithms with and without pretraining
Pretraining plays an important role in the recognition of SSAE model, and the comparison of the proposed model with and without pretraining (Fig. 10).
As shown in the Fig. 10, the proposed model with pretraining works better than without pretraining, their average accuracy rates were 98.96 % and 91.93 %, respectively, and 7.03 % was improved by pretraining.This result was relatively easy to understand.Without pretraining, the network can only fine-tune the unpretrained parameters, which usually comprise some random initial points.This notion explains why the network may be confined to the local optima.With pretraining, the network could be fine-tuned on the pretrained parameters that will obtain the global optima.Thus, pretraining SSAE is a necessary step.

Comparison among algorithms with different classifiers
In neural networks, several classifiers are involved.In this study, three classifiers were selected.By comparing their performances, sigmoid was adopted.Their recognition accuracies are shown in Fig. 11.The sigmoid performed much better than softmax and SVM; the sigmoid obtained a 98.96 % average accuracy rate, whereas softmax only attained 50.06 % and SVM only achieved 49.01 %.Softmax and SVM both poorly recognized coal and rock.In this experiment, three models only differed in the final classifier, and the other parameters were also similar, thereby ensuring the credibility of the experiment.

Comparison and discussion
For efficient performance assessment, we compared our algorithm with other four algorithms.The four algorithms include SVM, BP neural network (BP-NN), deep network (DN), one SAE-DN.For the SAE-DN, weights were pretrained only by one SAE.For the BP-NN, weights were optimized only in the back-propagation during neural network training.To be objective and fair, the environment of simulation for the five models were rendered similar.The average results of 10 experiments are presented in Fig. 12. Obviously, SSAE operated more effectively than the other algorithms.The testing accuracy of coal, rock, and the average of the two were 97.95 %, 99.98 %, and 98.96 %, respectively.The SVM algorithm was inferior to the other methods in both the recognition of coal and rock, with accuracy rates of only 88.95 % and 79.62 %, respectively, and an average recognition accuracy of 84.29 %.The SAE-DN algorithm did not achieve a satisfactory recognition of rock (89.24 %) but performed effectively for the recognition of coal (90.83 %).The DN algorithm (91.37 %, 92.44 %, and 91.90 %) was superior to BP-NN (89.09 %, 91.45 %, and 90.47 %) but inferior to SSAE in coal and rock recognition.The authors declare no conflict of interest.

Conclusions
This paper presented a novel method based on SSAE deep networks for the recognition of rock-coal interface in top coal caving.In the process, we found the most suitable installation location for the sensor.The SSAE algorithm consisted of two layers of SAE, in which the inputs of each layer were the outputs of previous layer, thereby providing the optimized weights for the deep networks.To obtain a higher diagnosis accuracy, we performed numerous experiments to obtain the global optical parameters.Furthermore, we compared the performance of our proposed approach with those of other four methods and proved the superiority of the SSAE model over the competing models.
Our study can help achieve better understanding of pattern classification or the use of SSAE deep networks.In the future works, we intend to acquire more signals of vibration because deep networks require large numbers of samples for network training.Then, we aim to use more effective feature extraction algorithms for improving the pattern recognition accuracy.We also plan to write a more efficient code that facilitates calculations and to test other efficient algorithms.
OF ROCK-COAL INTERFACE IN TOP COAL CAVING THROUGH TAIL BEAM VIBRATIONS BY USING STACKED SPARSE AUTOENCODERS.GUOXIN ZHANG, ZENGCAI WANG, LEI ZHAO

Fig. 3 .Fig. 4 .
Fig. 3. Recognition system of the rock-coal interface in top coal caving on the basis of the proposed method

Fig. 5 .
Fig. 5. 4508 type acceleration sensor During top coal caving, coal or rock strikes the tail beam supported by a hydraulic supporter to produce vibration.Vibration waves vary because of different size and hardness of coal and rock.Detection of tail beam vibration can successful identify and differentiate coal and rock.The installation position of the sensor is important for collecting signals.After analyzing the actual working scene, we noted that the main impact locations of coal and rock are the beam and tail chute, respectively, of the hydraulic supporter.Accordingly, we set the sensors at these two places.After many experiments, we found that the sensor is easily buried by falling coal and stone when the sensor is installed in the tail chute.Moreover, the continuous spraying of water for dust-proofing may have damaged sensors during coal caving.The running conveyor chute also generated considerable interference during signal acquisition, thereby negatively affecting the next step of the analysis.Hence, the end of the hydraulic support beam is an ideal location for sensor installation.Then, we chose the tail beam back to acquire vibration signals.The final self-designed experimental system and specific installation location of the sensor are shown in

Fig. 6 .
a) Experimental system; and b) specific installation location of the sensor

Fig. 9 .
Fig.9(a), (b) and (c) refer to the accuracy rates of coal, rock, and the average of coal and rock, respectively.Fig.9(a) shows that when the number of hidden units of the first SAE was set to 150 and the number of hidden units of the second SAE was set to 40, the accuracy of the coal reached the maximum value (98.12 %), followed by the proposed method structure of[11, 50,30,2] (97.95 %) and the structure of[11, 200, 50, 2] (97.85 %).Meanwhile, Fig.9(b) reveals that the

Fig. 10 .
Fig. 10.Comparison of the algorithms with and without pretraining

Fig. 11 .Fig. 12 .
Fig. 11.Comparison of the algorithms with different classifiers RECOGNITION OF ROCK-COAL INTERFACE IN TOP COAL CAVING THROUGH TAIL BEAM VIBRATIONS BY USING STACKED SPARSE AUTOENCODERS.GUOXIN ZHANG, ZENGCAI WANG, LEI ZHAO

Table 1 .
Parameters of 4508 type acceleration sensor

Table 2 .
Recognition accuracy rates with one feature

Table 3 .
Recognition accuracy rates with different features