PM4 SMP model proposed for system reliability criticality assessment and maintainability improvement

This paper gives a practical systematic approach towards the maintenance procedure optimisation of a critical industrial unit in operation, to improve its maintainability. The resolution of the maintainability challenge in the industrial unit (Vibrating screen unit VSU) was realised by performing a two-phase critical analysis, encompassing criticality and maintainability assessment. The criticality assessment comprised of failure investigation using fault tree analysis (FTA), vulnerability analysis using reliability block diagram (RBD), and failure mode effect and criticality analysis (FMECA). Furthermore, a maintainability assessment was performed on the industrial unit and improvement opportunities were identified. A generic model (PM4 Model) was conceptualised and used to improve the mean time to repair (MTTR) through a well-documented standard maintenance procedure (SMP).


Introduction
Designing systems with maintainability and reliability in mind is vital to early equipment management. Maintainability and reliability are "designed-in" hence, influenced as early as possible during system conception and design phase. Nevertheless, all stages of the asset life cycle are equally important. Each stage must be done right to assure value added performance, asset utilization, and high return on investment (ROI). Maintenance has huge impact on organisation's productivity and profitability [1]. It involves all the processes put in place to manage, control, execute, and restore [2] or preserve the inherent reliability of a physical asset [3], so as to guarantee the optimum level of asset availability and safety [4]. Decision making is vital to ensure effective maintenance delivery and since different assets do not pose the same amount of risk [5], establishing criticality becomes even more important. The amount of time spent on active maintenance activities will depend on the maintainability of the asset being maintained and it is usually out of the influence of the technician. However, identifying the correct root cause by applying techniques such as Fault tree analysis (FTA), Reliability block diagram (RBD) and Failure mode effect and criticality analysis (FMECA) are useful to minimising failures by appropriate remedial actions.
Organisations that are dependent on physical asset must understand the dynamic nature of their asset risk profile. Adams et al. [6] discovered that this is not often the case, as most asset owners assume the asset criticality to be fixed by making the assessment a one-of exercise. Crespo Márquez et al. [7] proposed a framework based on risk analysis and cost benefit principles, to rank assets thus producing as outcome, hierarchy of assets in order of business impact. Theoharidou et al. [8] followed a multi-layer assessment methodology supported by Haimes et al. [9] to assess criticality based on interdependencies of a nationwide infrastructure plan. They considered the risk at various sectors and their threat interfaces and impact at various defined levels to populate the risk priority of the integrated system. In addition, Saaty's [10] multiple criteria decisions making (MCDM) of analytic hierarchy process (AHP) have been used for criticality assessment by [11][12][13][14].
Maintainability is an important attribute of physical asset that significantly reduces the maintenance time and cost [15]. Attempts have been made by various studies to propose methods to evaluate maintainability at any stage of the asset life cycle. At the design stage; Coulibaly et al. [16] proposed an approach that uses the product 3D CAD model and its semantic matrix to evaluate maintainability and safety indicators prior to product development. Umeda et al. [17] proposed a method for assessing the modularity of product by evaluating and aggregating related product life cycle attribute. Zhou et al. [18] assessed the ergonomics in relations to maintainability by analysing the maintenance procedure and evaluating the maintenance space through free and constrained swept volume comparison in a virtual environment. Wani and Ganhi [19] in their effort to determine maintainability of mechanical systems based their assessment on tribology attribute. The characteristics of tribology were quantitatively assessed by assigning corresponding numerical value. The overall weighting represents the tribo-maintainability index which is directly proportional to the system maintainability.
Furthermore, [20] evaluated product maintainability based on its life cycle, by considering inherent attributes and external factors. The indicators are scored by expert judgment and the coefficient weight determined by fuzzy analytic hierarchy process (FAHP). Such multi-indicator maintainability consideration approach corresponds to [21][22] maintainability assessment.
Maintainability has been assessed statistically by considering historical data. Elevli et al. [23] represented repair time for mechanical system of electric cable shovels probabilistically using data from trended test. From which the outcomes were fitted to three selected probability distributions to estimate mean time to repair (MTTR) at different period as a measure of maintainability. Tsarouhas [24] determined the repair rate as a measure of maintainability for yoghurt production line, juice bottling enterprise [25], and strudel production line [26] by fitting data collected in probability distributions to obtain fitness index parameters of descriptive statistics.
The case study presented in this paper followed the expert judgement approach to assess maintainability under multiple indicators. However, the potential effectiveness of the proposed improvement route was evaluated with respect to time to repair (TTR).

Case study and system description
The system investigated is the Drying system in a Silica Sand Production Plant operated by an Extractive company (EC). The Silica production plant constitute of the Quarry system, Wash system, and the Drying system. Each with its own important function. The quarry is the main feed stock of sand into the wash system where the first screening and size classification occur. The drying system is the last stage of the production process and perfects the product quality for use.
The Extractive company in a bid to expand production and improve quality of its product to customers, acquired the vibrating screen unit (VSU) as part of the drying system. Consequently, after operating for more than 2000 hours the V-belt of the VSU failed and required replacement. V-belt replacement was classified by the company as non-safety critical [27]. However, the VSU V-belt replacement took a team of three (3) highly competent technicians, time to repair (TTR) of 8 hours, and total man-hours of 24 hours. This was because the accessibility of the drive assembly required different level of part stripping. As a result, the company lost an average of 100 tonnes per hour of production aggregating to total direct impact cost of £65k [27] with unquantified cost associated with customers' satisfaction.

System investigation
The system investigation was carried out by deploying a two-phase assessment encompassing criticality and maintainability assessment [27].

Criticality assessment
The criticality assessment was structured using the Method-Phase-Outcome (MPO) methodology [27]. The assessment was carried out by considering only one risk dimension that was prominent within the drying system based on information gathered from the EC CMMS initial asset ranking. It was observed that the most prominent risk dimension in the drying system out of the EC's seven risk dimensions (Health and safety, Environment, Quality, Working time, Impact on production, Breakdown frequency, Maintainability) was Quality. This necessitates the use of one risk dimension (risk criteria) to assess criticality as against multiple criteria criticality assessment approach used by [6][7][8][9][10][11][12][13][14].

System failure investigation
The processing system line investigation showed that the main systems that affect the product quality are failure of the Combustion chamber (IA1), failure of the Vibrating Screen (IB1) and failure of the process control sub-system (UC1). The resultant effect of the A1 is a moist or damp product (A); that of IB1 is inadequate removal of foreign particles (B); while that of UC1 is erratic process control (C). In the FTA, event IA1, IB1 and UC1 are linked by an "OR-Gate" because any of the events will constitute a quality issue despite the Combustion chamber system being at the upstream and the Vibrating Screen at the downstream of the production process, the erratic operation of the Process control system will impact the general outcome of the production process.
The IB1 event is as a result of either no gas supply to the combustion chamber (IB2), or ignition source fails low (B4), or sensor fails to send signal (B3). Furthermore, the IB2 can be caused by ruptured gas pipe (B6), or gas valve fails low (B5) or no gas at the supply source (UB7).
On the other hand, the IA1 event could be as a result of failure of electric motor (A3) or failure of drive assembly (IA2), or damaged/worn screen element (A4). Further investigation showed that the IA2 event can be caused by damaged pulley (A5), or damaged shaft (A6), or damaged V-belt (A7), or damaged main bearing (A8).
System failure been referred in this investigation does not necessarily mean that the system completely is shutdown (out-of-service), but in the context of the system's inability to deliver product to the desired quality (user requirement).

System vulnerability analysis
The vulnerability of the system under investigation was analysed by applying reliability block diagram (RBD) analysis tool. RBD is an important engineering tool that can be applied in prospective and retrospective events (redesign, modification or continuous improvement) of a system. It shows the logical connections and interaction among the various components that make up the system using asset blocks [4], which can be analysed using mathematical methods to determine the level of system vulnerability. The input of the RBD was obtained from the FTA in Fig. 3, since RBD is the natural outcome of FTA [28].
In modelling the equivalent RBD of the FTA as shown in Fig. 4, the OR-gate was represented in a series, while the AND-gate is a parallel arrangement in the RBD. The undeveloped events such as UC1 and UB7 were not captured because only the casual factors depicted in the basic events are to be represented. Further investigation may be carried out to get to determine the casual factors of the undeveloped events of UC1 and UB7, but such exercise does not add much value to the current scope of this study and so wasn't explored.  The level of system vulnerability increases with the number of components in series arrangement [29]. This interaction in Fig. 4 showed that the drying system is very fragile and any one of the ten (10) casual factors (basic events) could trigger a system failure due to poor quality product. It therefore means that proper attention and care must be given to the drying system to preserve its reliability.

System failure mode effect and criticality analysis
Having completed the system failure investigation and its vulnerability to producing poor quality product, a FMECA analysis was carried out to drill down to component level. This was vital to the understanding of the failure mode of each basic event from the FTA and thus gives clarity to the risk priority of individual event.
The risk priority was established by determining the risk priority number (RPN) which is a product of probability of failure occurrence (O), together with its severity (S), and difficulty of detectability (D). The value of the three (O, S, D) decision criteria was obtained from a scoring system signifying criteria keyword and impact score as shown in Table 5, Table 6 and Table 7.   Can cause an incident resulting in disruption at the customer premises. Customer may submit a formal complaint or claim for consequential losses. 5 From the FMECA analysis carried out on the drying system the result of which is shown in Tables 3 and 4, it was established that the RPN of the components of the Vibrating Screen unit is on the high side especially those of pulley (75), shaft (75), V-belt (100) and main bearing (75) which collectively constitute the Drive assembly of the Vibrating Screen unit. The subsequent sub-section assesses the maintainability of the VSU.

Maintainability assessment
The maintainability assessment of the critical unit (VSU) was done by analysing maintainability attributes and evaluating them with regards to maintenance requirements. The Input-Tool-Output (ITO) assessment method was used to structure the assessment.  5. ITO maintainability structuring method [27] Maintainability evaluation criteria applied during this assessment is similar to that proposed by [21]. The inherent design attributes of the VSU (Generic attribute), together with the function of maintenance supportability and all maintenance actions carried out on the VSU unit (Distinct attribute), with maintenance complexity level (MCL). Following the MCL criteria in Table 8 V-belt replacement on the VSU is level 2 complexity. Hence the unit maintainability was performed at Level 2 maintenance complexity.
In order to ensure accuracy of the result the scoring was achieved through the Delphi technique to determine the maintainability indicator rating (MIR). The technique is structured with panel of experts made up of engineers and technicians operating and maintaining the VSU. A well formulated questionnaires on the generic and distinct maintainability attributes of the VSU with regards to V-belt replacement, was answered by the experts in two rounds. An anonymised forecasts summary was given by the facilitator in each round as well as the reason for the experts' judgements. Unlike many Delphi exercises that may require many rounds of questionnaires, the result for the VSU MIR scoring converged after the first round. However, a second round of questioning was undertaken to validate and stabilise the result. Simple maintenance actions performed when the unit is online. These include simple replacement of components that are easily accessible, and easy adjustments without requiring disassembling.

Level 2
Maintenance actions requiring off-line replacement of components in operations.
Here no failure investigation is required as the maintenance action to be performed is known and scheduled as preventive or corrective task.

Level 3
Maintenance action requiring failure identification and diagnosis when the unit has been set off-line, before preventive or corrective task is performed.

Level 4
Inspection maintenance actions, requiring extensive amount of testing and preventive or corrective task when the unit is off-line.

Level 5
Overhaul, or unit upgrade or modification requiring the unit to be shut down before maintenance action is performed.

Identification of improvement opportunity
The MIR was arrived at using a linguistic model with numerical scale of 1 to 5. Where "1" indicates the lowest rating and "5" the highest rating as can be seen in the graphical representations in Figs. 6-7. Attributes with low rating indicates opportunities for maintainability improvement on the VSU. The opportunities for improvement was identified by simply isolating and drilling down on the poorly scored maintainability indicators for both the generic and the distinct attributes. Three maintainability improvement routes namely, retrofitting, redesigning, or developing standard maintenance procedure (SMP) for the VSU, were proposed for consideration following the assessment.

Selection of effective improvement route
The effective improvement route was considered in the context of immediate remedial solution for VSU maintainability improvement. A multiple criteria decision-making (MCDM) of analytical hierarchy process (AHP) was used. The three proposed maintainability improvement routes in were subjected to the following multiple criteria: 1) Production impact -the effect of unavailability of VSU on EC's operations productivity.
2) Lead time -the period between the implementation initiation and completion.
3) Total cost -the sum of direct and indirect cost incurred. 4) Knock-on-effect -secondary failure introduced as a result of implementation. The AHP consist of a pairwise comparison of Criteria (Level 1) with respect to the Goal (Level 0). Furthermore, another pairwise comparison of the improvement alternatives (Level 2) with respect to the Criteria was performed. This produced a consolidated weight for each of the three alternatives. The model for the AHP is shown in Fig. 8. Based on the result obtained from the AHP analysis, the highest consolidated weighted alternative was "Developing Standard Maintenance Procedure (SMP)" while the least is "Retrofitting". These two options are the only routes within the power and influence of EC to effect change. The "Redesign" option is external to EC and lies within the power and influence of the OEM. Hence, developing SMP was considered the best improvement route for the EC's in-service vibrating screen unit. However, the Redesign option provides a permanent solution to improving the maintainability of VSU equipment.

PM4 model for operational maintainability improvement
PM 4 model was conceptualised for use to develop standard maintenance procedure (SMP) for the identified failure modes of the VSU. Conceptualising engineering solution delivery in models facilitates communication, thinking, and knowledge retention across various functional team involved in the asset value delivery supply chain. Solution delivery models augment understanding of engineering solutions beyond the textual statement of requirement [30]. Thus, providing a holistic visualisation of requirements, information flow, and points of performance analysis to drive continuous improvement.
BS 4778 [31] defined Maintainability as "the ability of an item, under stated conditions of use, to be retained in, or restored to, a state in which it can perform its required functions, when maintenance is performed under stated conditions and using prescribed procedures and resources". This definition is the bedrock on which the PM 4 model was conceptualised. PM 4 as an acronym stand for Permit x Manpower x Method x Material & Machine. They have been linked together by a product or multiplication sign "x". This is to say that good result can only be obtained from using the model when all the elements that make up the PM 4 model have been well articulated, in synergy, and managed for the task at hand.
From the [31] Maintainability definition, the "stated conditions" requirement is covered in the "Permit" element of the PM 4 . While "procedures" requirement is covered in the "Method" element, and the "resources" requirement is covered in the "Manpower, Material and Machine" elements of the PM 4 model. Simplicity of the model makes documenting SMP very easy. The use of SMP in equipment maintenance provides a range of positive multiplier effects. It can serve as the basis or starting path for incident investigation. In some cases, a vital training document for new personnel. Also, statutory standards and regulations are met all the time when SMP is used, and tasks are performed to the required degree of precision. In all, asset mean time to repair or restore (MTTR) is tremendously reduced considering lost time injury (LTI), travel times getting tools from stores (or external sourcing), waiting time for permits, and time spent reworking due to wrong procedure [32].
SMP documentation for an asset should be treated as a project with a clear requirement, scope, schedule (delivery duration), resource allocation, quality checks, and output validation. Although this may not incur external cost as the resources needed to achieve it are internal. However, cost of the manhours dedicated to this effort should be considered to avoid overrun which can discourage subsequent adoption due to cost. As a minimum, those who have asset care responsibility over the asset on which SMP is to be developed, should form the delivery or project team. Other stakeholders should only be consulted as when needed need. This approach was adopted by EC for the VSU SMP development and the result was cost effective and technically feasible. Such project approach ensures that proper attention and measurable commitments are given to asset SMP development. Is not surprising that backlog and long MTTR are major contributors to asset unavailability (downtime) in Plants where SMP is not considered essential to asset care.

PM4 model description
The model is structured into IOC -Input, Output, and Control. The IOC stage structuring ensures that the SMP delivery team understands the starting requirement (Input), expected product (Output), and means of measuring effectiveness of the output in achieving the maintainability goal (Control). SMP development requires a feedback process, which ensures that lessons are learned from performing maintenance task. Subsequently, lessons learnt can be used for performance improvement of the system or operation. All stages of the IOC interact with each other to have a Live Asset SMP that meets the asset maintenance objectives. 1) Method -This is a way of structuring maintenance problems and the correspondingly task risk assessment. The procedure for carrying out various maintenance activities on the VSU was drawn from technician's experience and exposure actually doing the work. This improves the efficacy of the prescribed job steps in the VSU-SMP, as they are product of lesson learned doing same task on same unit, under the same operating context. The efficacy of the method element is also visible in its ability to eliminate potential downtime due to learning known failures often associated with rarely performed tasks. The maintenance life plan in the SMP development model having been established through a rigorous RCM process already implemented by EC and additional input from our critical analysis, ensured that all failure modes are accounted for and maintenance method written for each of them. Maintenance time (restoration or repair) and level of task complexity is further reduced by the fact that the method element provides a comprehensive knowledge base for technicians in care of the VSU. This is validated by [28] and supported by [33] that formulating problem solves 80 % of it.
2) Manpower -This covers the human resources and level of competence required to perform maintenance on an asset. The training of maintenance technicians and opportunity to perform the task on which they are trained improves competence. Manpower element efficacy is evaluated by the competence level of the technician. Planning out this requirement reduces thinking and trialerror correction time which in turn reduces asset (VSU) downtime and mean time to repair (MTTR).
3) Material and Machine -availability of spares and ancillary machines (such as lifting equipment, tools, etc) can be a major source of increased MTTR and downtime of an asset, if not properly planned out. In some cases, these materials and machine are not stored within the maintenance organisation and need to be sourced externally. The lead time should be properly considered when organising maintenance activities. The efficacy of this element is evaluated by the amount of waiting time.
4) Permit required -This is an important element of the PM 4 SMP model. It helps control the operation and maintenance of the asset (VSU) in a safe manner. Permit-to-work (PTW) is a documented procedure that grants authority to certain people to carry out specific work within a specified time frame. It sets out the precautions required for safe work completion based on risk assessment. A declaration will usually be required from the work authoriser and permit originator before and after the asset has been returned to normal functional state. The efficacy of this element is evaluated by the amount of waiting time to get the PTW.
OUTPUT -The PM4 SMP model output is a well-documented standard maintenance procedure for the asset. The content should capture all the elements of the PM 4 needed for maintenance operation of the asset (in this case the VSU). The writer of the SMP should be skilled and knowledgeable in the art of communication. This is to ensure that the inputs (PM 4 elements) are documented in a fashion that is understandable to the reader and serves the goal of the user [3]. The grammatical structure should be simple (elementary grade), short and verbalised. Visual impressions (pictures) should be used to elaborate the points being communicated. Such will aid understanding and consistency across various levels of education or experience. Finally, the job method or procedure should be sequenced in a logical order of the natural job flow. The maintenance procedures for different failure modes of the asset should be broken down in page chapters, sections, and sub-sections respectively.
CONTROL -The efficacy of the PM 4 model is measured using key performance indicators (KPIs). The selected KPI should be drawn from or have direct impact on each of the PM 4 elements. This reflective in our adopting a feedback process in the model to ensure that performance is measured, analysed, and benchmarked to the degree of achieving the maintainability goal set for such asset. In some cases, while performing maintenance on the asset, lessons are learned and may require an update or revision of the current SMP. Such update or revision should be governed by management of change (MOC) process. For instance, if there is a modification in the asset or change of functional location during a major shutdown or turnaround event, it is good practise to review (and update is applicable) the SMP as this may impact how maintenance will be carried out going forward [32].

Conclusions
This study was undertaken to develop an effective maintenance approach that will reduce the maintenance repair time of a critical vibrating screen unit (VSU) operated by an Extraction Company (EC). In order to achieve this scope, onsite Plant study was done with primary data retrieve from EC as major input. Secondary data was collected from the review of relevant literatures and original equipment manufacturer (OEM) manual.
The plant processes and equipment configuration were described to establish the context of the study. Primary and secondary data collected were subjected to critical analysis which was performed in two phases namely the criticality assessment and the maintainability assessment.
Criticality assessment entailed performing failure investigation on the drying system under the quality risk dimension using Fault tree analysis (FTA). The basic events from the FTA were modelled into a reliability block diagram (RBD) to determine the degree of the system vulnerability. The mostly series structure showed that the system is very vulnerable. Furthermore, the basic events were subjected to failure mode effect and criticality analysis where the failure mode of each basic event (component) were determined and their criticality determined through the risk priority number (RPN). The components with the highest RPN are from the VSU. Thus, the VSU was selected as the most critical unit.
The maintainability of the VSU was performed by considering various attributes divided as generic and distinct. A set of criteria was developed for each attribute and a linguistic scale used to quantify each of them. To ensure the scoring was done with minimal error, the Delphi technique was used.
Based on the identified areas of improvement, three options were considered as possible improvement routes namely -developing standard maintenance procedure (SMP), retrofitting, and redesign. The best immediate improvement route for the EC was determined using multiple criteria decisions making of AHP under certain criteria. The one with the highest consolidated weight among the options was -Developing SMP. PM 4 generic model for SMP development was conceptualised for use to improve in-service maintainability of the critical VSU asset.