A Synthesis of the BRIDGE Background

Introduction

Problem Description

Case-Based Reasoning

Real-Time Expert Systems

The Q-Methodology

The BRIDGE Approach

References

Introduction

Safety, reliability and life-cycle costs of technical systems are directly determined by efficiency in fault diagnosis throughout the life cycle. Within the BRIDGE project the aim is to significantly improve diagnosis efficiency in operations and reduce the efforts required in supporting actions. Improved efficiency in operational diagnosis comprises a drastic increase of automation for diagnosis of occurring failures and problems, significant increase in coverage of failures and problems by tuning of symptoms and additional tests, and ensuring guaranteed response times.This report gives a brief synthesis of the background to the project, which comes from three independent developments; case-based reasoning and more in particular the use of fault networks for fault diagnosis, state-based real-time expert systems, en the Q-methodology and more in particular the LIMITS CASE-Tool. The report has been kept short with many references to the articles and theses produced. All articles and theses are available for the EU and the Reviewers upon request.

Problem Description

The safe and efficient operation and maintenance of modern technical systems involve several on-line and off-line diagnostic tasks. On-line tasks have to respect strict real-time requirements. Each diagnostic task has specific goals, decision strategy, input and output information, but all concern the same technical (sub) system and should therefore comprise similar domain knowledge. Ideally, the knowledge in each task is consistent with every other task, but as systems are usually developed separately, tremendous additional efforts are required to obtain and maintain some degree of consistency. The lack of consistency and integration between operational and maintenance diagnostic systems is a major reason for inefficiency in exploitation.

Large technical applications

The applications considered in the BRIDGE project are large technical systems for which the total number of symptoms, tests, actions and faults are in the order of tens of thousands (e.g. see Raaphorst et al., 1995). As the relations are too numerous and complex and the acquisition of a correct model would be too costly, in practice it becomes impossible to develop an accurate causal model (see Netten and Vingerhoeds, 1994). In reality, therefore, such models are developed for critical components and systems only. Modelling the interactions with other systems and the environment, however, remains a major problem. Maintenance of these explicit models, e.g. for specific customer application updates and modifications, is very costly as well.

Explicit formulation of diagnosis systems

For modern fault-diagnosis systems, faults have to be specified explicitly in terms of their resulting symptoms and test results (facts) as well as the relations between those facts. These facts can be explicitly arranged according to the specified relations in the form of a fault-tree, frequently constructed from individual sub-trees for symptoms and sub-systems. Due to the complexity of modern technical systems, together with the number of on-board systems they include, the typical size of their diagnostic problems is increased to such an extent that for fault-tree-based diagnostic systems the following problems can be identified:

It is almost impossible for human experts to explicitly define all facts and their relations consistently.
It is also almost impossible for human experts to generate, verify and maintain a complete and consistent fault-tree of this size. Due to the inherent structure of a fault-tree, duplications of parts of branches are very difficult to detect, while any modification has to be specified explicitly in every related sub-tree.
The objectives, the series of tests that can be performed, and the actions that can be taken may be different for every diagnostic task. In practice, for each diagnostic task a separate system has to be developed.
Different diagnosis strategies ask for a different ordering of the additional tests. The more decisive tests should be placed higher up in the tree. For each diagnostic task, only tests that can be performed in the given situation should be invoked.
During diagnosis, not all known symptoms and test results are directly in a classic fault-tree, but remain unused until they appear in a fault-tree node. Much valuable time is lost because not all the available information is directly used to evaluate parts of sub-trees.
Searching for not yet known information requires advanced tree-search techniques and search time.
The behaviour and reliability of system components cannot always be precisely specified for real-world applications. The occurrence of symptoms and the outcome of tests cannot be assigned with 100 % probability for each fault. In practice, a human expert should estimate these.

Case-Based Reasoning

The use of expert system technology offers several advantages for use in on-line fault diagnosis systems, thereby incorporating the knowledge and experience of manufacturers and users. The size of the diagnosis problem is such that explicit formulation of the fault-tree is very complicated. The knowledge, however, is implicitly available in the description of faults in terms of corresponding symptom-codes, results of performed tests and repair actions. Case-based reasoning (CBR, see Aamodt and Plaza, 1994) can be used to facilitate the automatic generation, consistency checking and maintenance of the fault-tree (see Netten and Vingerhoeds, 1994). CBR allows reducing the problem formulation to a definition of fault-cases only. The syntax of a fault-case is simple and straightforward. For each individual fault, a case is defined, consisting of failures, results to additional tests and repair actions. CBR reasons over implicit relations in cases and provides the required knowledge for development of a fault diagnosis system.A new problem is diagnosed by remembering similar fault cases. Additional tests should be performed to refine the diagnosis. Relevant tests are identified from the most similar cases, and a preference order can be determined by some information gain. The cases are reused by suggesting their actions in the priority order of their similarities. Advantages for applying case-based diagnosis include concentration of fault and failure knowledge in one case-base, reduction of development and maintenance costs of specialised diagnosis systems, and effort reduction of acquisition and maintenance of the knowledge base. A major problem is encountered with respect to the processing speed of CBR systems. Off-the-shelf CBR tools do not meet the real-time requirements. Some of the existing tools structure the case-base in tree structures, which for fault diagnosis have some serious disadvantages with respect to the search process and time-to-diagnosis.The approach presented here for fault-tree generation and on-line fault diagnosis makes use of case-based reasoning techniques, tailored in such a way that the response time for each diagnostic system is minimised. Matching and retrieving the cases from a database would be computationally too time-consuming for on-line diagnosis. To avoid extensive database queries, a fault-network is first built off-line from the cases, and only this network is used during the on-line diagnosis process itself. A textual case description of the symptoms, tests and results is only necessary to inform the user during development or diagnosis. This information should therefore be contained in databases for the man-machine interfaces. Within the actual on-line diagnostic systems, the cases, symptoms, test results and actions are only referenced by indices. The fault network developed resembles the Rete network (see ). Some additional node-types are added to this structure to account for uncertain information of fault descriptions. The top layer of the network consists of all input nodes for the symptoms and tests. In the next layers, nodes are built for each combination of symptoms and results, in such a way that each combination is built only once in the network. The bottom layer consists of nodes with reference to the individual faults. Building such a fault network has several major advantages over other diagnostic systems:

The structure of the network is smaller. Each test result and each symptom appear only once in the network as input nodes.
Efficiency of the data-driven network. A high efficiency is realised by making optimal use of the fact that the results of a new test only lead to limited modifications in the network. In every cycle, the current status of all nodes is maintained, and only modifications of the inputs are propagated through the network.
Simultaneous diagnosis of multiple symptoms. The relations of sub-trees for specific symptoms in the former fault-trees are now merged into one network, allowing for the simultaneous treatment of multiple symptoms.
Diagnostic strategy inhibited in the network. From the objectives for a particular diagnosis problem, general rules for the diagnostic strategy can be derived. The strategy should be used to determine the inference process. In an off-line procedure, steps can be assigned for each diagnostic task separately, according to the rules of the specific strategy. For on-line diagnosis, the appropriate strategy is maintained by simply following the threads through the activated nodes.
Non-procedural order. The nodes on the threads are not addressed in a procedural order, as any input is immediately propagated to any node in the network.
Computer memory can be reduced. For on-line diagnosis, only the network structure is loaded into memory as a set of pointers between the nodes.
Reduced search time. The response time for diagnosis is reduced in several ways with respect to fault-trees; the structure to be searched is smaller and all cases satisfying a relation are treated in one operation within one node. The diagnostic process itself is not hindered by time-consuming string operations.
Consistency checking. Faults with identical descriptions are found in the same end-nodes of the network and can be easily detected. Input nodes that have no further links downwards either represent default answers to questions, or indicate that faults are not completely specified in the case-base).
Incorporation of new experience. New experiences of maintenance crews and operators can be incorporated as new faults in the case-base as they become available.

The initial developments are reported by Netten and Vingerhoeds, 1994, used the network on-line as search medium. Initial measurements reported to lead to a satisfactorily behaviour.

Real-Time Expert Systems

Recently, there has been considerable interest in the use of expert systems in real-time applications (see Jones and Rodd, 1993). Resulting from this interest, a generation of expert system tools, or shells, designed specifically for real-time applications, has been developed. However, these tools suffer from a number of defects. Most notably, the current generation of tools is not suitable for hard real-time applications, where the response-time must be guaranteed. To overcome this problem, an alternative approach to the development of expert systems for hard real-time applications has been developed (see Jones, 1995). A state-based architecture is adopted, as opposed to the event-based approach used by existing real-time expert system shells. In a state-based system only one process is active at any time. Hence, every process has exclusive access to any global data and no process will be pre-empted. This eliminates the problems associated with controlling access to global data, such as the blackboard of an expert system. The use of a state-based architecture overcomes some of the problems of predicting the execution times of expert systems. However, it does not solve the problem predicting the execution times of the inference process. To do this, a rule-set compiler has been developed, which converts an expert system rule-set into simple non-recursive procedural functions with an execution time that is largely unaffected by the input data. This allows guaranteed worst-case execution times for the expert system to be determined. In this way, searching within an expert system is for a large part performed off-line. On-line, a relatively simple search process remains. The rule-set compiler generates, from a given knowledge base, C or Fortran language functions with a known worst-case execution time, that are logically equivalent to the knowledge base.

The Q-Methodology

Good software engineering tools should allow for verification of the required real-time behaviour before implementation. Unfortunately there are not many available tools that give a guarantee that the design will be conform the pre-set real-time requirements. Modelling techniques exist which try to model a real-time application, but they unfortunately cannot completely model hard real-time problems. Most of these modelling techniques lack the capability to handle truly a-synchronic message transfer. Some other techniques exist, but are so complicated to use that they get abandoned. One real-time methodology offers good possibilities, which can cope with the requirements in designing hard real-time systems. This competitive methodology was developed by Motus and Rodd, 1994, building on an original approach by Quirk, and now is known as the Q-methodology, allowing the specification and verification of all required timing characteristics during the design of real-time systems. In the Q-methodology, the temporal information is modelled via so-called processes and channels. By connecting the processes with the channels, data and control may be transferred between different processes and an order of execution may be provided. Each process has an own time-set, which may be shared by other processes if synchronously running of these processes is required. Using synchronous channels between two processes does synchronously starting of the processes. Two other channel types in the Q-methodology are the semi-synchronous channel and the a-synchronous channel. The semi-synchronous channel, comparable to the only channel type a Petri-net has, starts the following process when the previous process has produced its data and finished its task. The a-synchronous channel may be compared to a data-channel where information is available for the consumer processes, without triggering the consumer process to start. The timesets of these two processes are not related to each other.The Q-model enables to incorporate and verify timing constraints starting from the specification and ending with the maintenance (the whole life cycle). It combines analytical (formal) and simulational (informal) approaches for verifying time correctness. Formal analysis of a system described in the Q-model is performed in three stages:

Analysis of separate elements (parameters of a process and of a channel)
Analysis of pairs of interacting processes (separately for each channel type)
Behavioural analysis of a group of interacting processes (deadlock, performance, synchronisation precision)

The Q-methodology enables to follow the satisfaction of the primary time constraints through life-cycle stages (requirement specification, algorithm specification, logical design, physical design, and implementation). It also supports the introduction and proof of secondary time constraints, implied by the primary constraints. The Q-model, like many other computational models, is based on heuristics, which have been filtered pragmatically. It has been demonstrated that the Q-model is a superset of ordinary Petri Nets, and that the Q-model can be mapped into a weak second order predicate calculus.Modelling of a real-time application with the Q-methodology is concerning the dynamic and functional behaviour of the whole application. For a complete model, it is necessary to model the static information as well. Object oriented methodology functions as a good addition to the Q-methodology for a static base of the complete model. In particular, the Object Modelling Technique (OMT) of Rumbaugh et al., 1991, offers good possibilities.To target the real-time software engineering market, a combination of OMT and the Q-methodology was developed within the framework of a research and development project supported by the European Union. The LIMITS-CASE tool allows the specification and design of hard real-time applications and the analysis of the real-time characteristics. First static information is specified in the form of object models, where the characteristics of each class is provided in the form of attributes and operations belonging to that specific class. For an intuitive approach, it is advisable to continue with the creation of state-diagrams of the dynamic classes, see e.g. Zijderveld and Vingerhoeds, 1996b. After a satisfactorily design has been made, it is possible to continue with the creation of the Q-diagrams. In these diagrams all information is specified concerning the timing performances of all processes and the interaction between each other. When the complete model, or an integral part, of the real-time application is finished, it is possible to animate the behaviour of the application to provide a first simulation of the whole process. With this animation it is also possible to perform certain scenarios, where a fault is added on purpose. The behaviour of the system, in reaction to this fault, can be animated, which may be of interest to the designer.

The BRIDGE Approach

Within the BRIDGE project, the above-described approach is extended using a two-phase approach to address the problems identified above. In an off-line process, the case-base is built, analysed and modified for a specific diagnosis task and specific real-time diagnosis systems are generated from the case-base for each operational mode. On-line diagnosis is performed by this real-time system, while the original case-base is not accessed. The Case-Based Reasoning cycle, presented by Aamodt and Plaza, 1994, is now executed in two loops; in the off-line process, the cases are retrieved, analysed, revised and retained, while the actual reuse of fault cases and actions occurs in the on-line diagnosis loop.

Off-line development of the fault-base

The application is represented by a case-base of (analytical) fault cases and more general domain knowledge about the features. Every diagnosis task is defined separately by its strategy and operational conditions, including the technical (sub) systems, alarm level(s), operational phase(s), operator level(s), maximum available efforts and time for executing tests and actions, and performance requirements. Real-world problems are recorded during on-line diagnosis in a history case-base.The strategy and operational conditions of a diagnosis task determine which relevant cases and features are retrieved and organised in a network. This network is used for the analysis and revision of the fault-base. The network structure enables the identification of the most critical situations that can be foreseen during on-line diagnosis. Unresolved problems and occurred critical situations from the history case-base are analysed. Each critical situation is treated separately. Similar cases are retrieved for analysis of correctness and their sensitivity to the critical situation. Cases that do not satisfy these conditions and will require a revision of the network structure, which could also involve a modification of cases or some features in the case-base. The coverage and real-time requirements (see further) are only analysed for a network satisfying the correctness and sensitivity requirements, but could also lead to revisions and case-base modifications to improve the coverage or timing behaviour. It should be noted that these revisions and retainments are performed only after validation and authorisation of a human expert. The analysis only provides suggestions for modifications. When the network is validated for all requirements, a real-time diagnosis system can be generated from the network structure.

On-line fault diagnosis

A new problem is matched to the faults and actions present in the diagnosis system. If the matching is sufficient, the appropriate actions are suggested to the user. Otherwise additional tests are suggested for which the results provide relevant new information. The on-line process does not have any direct access to the original case-base.

Guaranteeing Real-Time Behaviour

Once a suitable subset for the diagnosis system has been selected, and analysis of correctness, sensitivity and coverage have been performed satisfactorily, the real-time characteristics of the envisaged system have to be assessed. This means that the response time of the on-line diagnosis system must match the requirements of the process at hand. There is a need for formal analysis of the resulting on-line system that will be derived from the network structure. It is therefore necessary to assess the overall real-time behaviour. The mentioned Q-methodology was further developed and combined with the Object Modelling Technique (OMT) within the framework of the LIMITS project, supported by the European Union. The LIMITS-CASE tool allows the specification and design of hard real-time applications and the analysis of the real-time characteristics. Parts of the LIMITS tool, in particular related to the verification of the real-time behaviour, can now be used for the assessment of the real-time behaviour. When the real-time behaviour matches the requirements, the case-compiler can effectively be applied to generate the C-functions (see e.g. Vingerhoeds et al., 1995b).

BRIDGE Toolset

The complete system under development now consists of two main parts, to be called the "operational diagnostic tool" and the "support function tool". The support function tool allows administrators to administrate the faultbase, i.e. entering new cases, modifying existing cases, specifying possible tests, actions and potential test results, etc. In addition, recorded problems, and unresolved failures can be analysed. The tool generates the network, which allows for evaluation of coverage and sensitivity of symptoms, tests and test results using special purpose analysis routines. When a consistent network has been obtained, an analysis for the expected real-time characteristics can be performed, and when the performance is within specified time limits, the support function tool will generate the real-time kernel for on-line use. The tool also generates a condensed history from the global history in the data store for on-line retrieval. The operational diagnostic tool provides access functions for non-administrative persons for the real-time kernel and the condensed history, dependent on a certain operational mode and governed by an operator level. The tool records a history of all related data of problems for future evaluation in the support function tool. The actual fault diagnosis is performed by the real-time kernel, which monitors the technical system in real time and suggests additional tests, based on presented symptoms, previous test results and interaction with the operator. Based on this the fault will be identified and corrective actions will be suggested. The case-base is made up of cases; each described by symptoms, tests, test results and actions. Each case represents a single fault. A certainty factor will be specified for the occurrence of the fault and an alarm level will be specified. Symptoms are defined with information about alarm levels, operational status and textual explanations and a certainty factor for the occurrence of the symptom for the given case. Tests are defined with possible results, time and efforts required to perform the tests, operational modes during which the test can be performed. As with symptoms, possible test results are specified by a certainty factor for the occurrence of the result for the given case, or vice-versa.

References

Aamodt, A, and E. Plaza (1994). Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. AI Communications, 7, nr. 1, pp. 39-52, March.
Forgy, C.L. (1982). Rete: A fast Algorithm for the Many pattern/Many Object Pattern Match Problem. Artificial Intelligence, 19, pp. 17-37.
Jones A.V. (1995). "An Approach to the Design of Expert Systems for Hard Real-Time Applications", PhD thesis, University of Wales Swansea.
Jones and Rodd, 1993Jones, A.V., and M.G. Rodd (1993). Problems with Expert Systems in Real-time Control. Engineering Applications of Artificial Intelligence, 7, nr. 3, pp. 499 - 506
Jones, A.V., and M. G. Rodd (1994). An approach to the design of expert systems for hard real-time control. IFAC Workshop on safety, reliability and applications of emerging intelligent control technologies, Hong Kong, 12-14 December, pp. 30-35.
Jones A.V., Vingerhoeds R.A., Rodd M.G. (1995). "Real-Time Expert Systems for flight control.", IFAC Artificial Intelligence in Real-Time Control, AIRTC'95. Slovenia, November 29-1 December.
Motus, L., Rodd, M.G., Timing Analysis of Real-Time Software, Pergamon Press, Oxford, 1994.
Netten B.D., and R.A. Vingerhoeds (1994). Automatic Fault Tree Generation. IFAC Workshop of Safety, reliability and applications of emerging intelligent control techniques, Hong Kong, 12-14 december 1994, pp 182-187.
Netten B.D., Vingerhoeds R.A. (1995a). Automatic Fault-Tree Generation: A Generic Approach for Fault Diagnosis Systems, TRAIL PhD congress, Multidisciplinary visions on TRAnsport, Infrastructure and Logistics, May 30, Rotterdam.
Netten B.D., and R.A. Vingerhoeds (1995b). Large-Scale Fault Diagnosis for On-Board Train Systems, in: Case-Based Reasoning, Research and Development, (eds.) Veloso M., Aamodt A., Lecture Notes in Artificial Intelligence 1010, Springer Verlag, Berlin, pp 67-76.
Raaphorst A., Netten B.D., Vingerhoeds R.A., Automated Fault Tree Generation for Operational Fault Diagnosis, RAILinkÆ95, IEE Int. Conf. Electric Railways in a United Europe, Amsterdam, March 27-30, 1995, pp. 173-177.
Rodd, M.G. (1995). Safe AI - Is this possible? Engineering Applications of Artificial Intelligence, 8, nr. 3, pp. 243-250.
Rumbaugh J., Blaha M., Premerlani W., Eddy F., Lorensen W., "Object-Oriented Modelling and Design", Prentice-Hall, 1991.
Vingerhoeds, R.A., Janssens, P., Netten, B.D., Aznar Fernández-Montesinos, M. (1995a). "Enhancing off- and on-line monitoring and fault diagnosis", Control Engineering Practice, Vol. 3, Nr. 11, pp. 1515-1528.
Vingerhoeds, R.A., B.D. Netten and L. Boullart (1992). Artificial Intelligence in Process Control Applications. Communications and Cognition in Artificial Intelligence, 9, pp. 161-173.
Vingerhoeds, R.A., Netten, B.D., Jones, A.V., Rodd, M.G. (1995b). Enhancing On-Line Fault Diagnosis using Real-Time Expert Systems, in proc: ESTEC Workshop Artificial Intelligence and Knowledge Based Systems for Space, 10-11 October.
Zijderveld P.D., Vingerhoeds R.A. (1996a). "Design Methodology for Real-Time systems", proc. Advanced School for Computing and Imaging conference, June 5-7 1996, Lommel, Belgium.
Zijderveld P.D., Vingerhoeds R.A. (1996b). "Ensuring Real-Time performance of Expert Systems", proc. IEEE Workshop, AI in Aerospace Applications, November 16, 1996, Toulouse, France.

Updated by Rob Vingerhoeds (rob@kgs.twi.tudelft.nl)

Last Update 06-11-1997.