Systems Engineering for Safety Critical Applications

Part 2 - Using Python and Graphml to Model and Solve an Extended Boolean Fault Tree

If you want to build robots, rockets, or human colonies on mars, you need to know how to know how dependable your system is going to be. A Boolean Fault Tree is perhaps the simplest way to introduce rigour into a dependability analysis and so is one small step towards designing a dependable system. In this post, we will use a neat little program to create our model and save it as a graphml file. We can then write a python script to parse the file, extract our model and calculate the minimal cutsets using the MOCUSalgorithm. Finally, we will extend the fault tree by propagating failure rates ofleaf nodes through to every non-leaf node in the tree, giving us the system failure rate.

Gross Dependability Analysis with Markov Chains. A gross dependability analysis to derive a qualitative numerical estimate of system dependability
Failure Analysis with Binary Fault Trees. An extended Binary Fault Tree to calculate the smallest sets of nodes that, by failing, will bring the system down.
Redundancy as a Tool for Dependable System Design. Using the fault tree modelling tool from part 2 to compare, quantitatively, different redundancy architectures imposed on a simple automotive braking system model.

All code, derivations, tools and files used to generate the analyses are made available to the reader at github.com/hgrw/safety-critical-systems-blog.git.

The Boolean Fault Tree

The Boolean Fault Tree is a model used in Fault Tree Analysis which an undesired state of a system is analysed using Boolean logic to combine a series of lower-level events. This kind of top down, deductive analysis is prescribed by ISO-26262 (2018), part 4 (Product development at the system level), clause 6.4.4.1, which states that safety analyses on the system architectural design shall be performed in accordance with Table 1 and ISO 26262-9:2018, Clause 8 in order to:

provide evidence for the suitability of the system design to provide the specified safety-related functions and properties with respect to the ASIL;
identify the causes of failures and the effects of faults;
identify or confirm the safety-related system elements and interfaces; and
support the design specification and verify the effectiveness of the safety mechanisms based on identified causes of faults and the effects of failures.

Table 1 - ISO26262 System architectural design analysis

Creating A Model

I have been playing around with yEd lately because it represents diagrams in graphml format, which makes it really easy to parse, extract and solve models using a scripting language such as Python. The software is also cross-platform, freely available and very nice to use. It would be really cool to set up a V&V workflow that automates and links a build-server test suite (unit, integration, funcitonal, HIL, etc) back to a requirements tree. What fun! - maybe the subject of a future post ;)

We will take Chris Hobbs' system from Part 1 and model it using yEd, below. I set up the parser script to interpret the shape labels as JSON objects, so if you want to use the tool, make sure that the syntax is correct. Each node has three fields:

name - An arbitrary descriptor except in the case of the TOP node, which is required by the solver. In our system the TOP node indicates system failure.
FIT% - Probability of failure per 10E9 hours of operation. This is slightly different to a FIT score, which is the rate of failure per 10E9 hours of operation. For a FIT% score to be calculated for each node in the system, all leaf nodes must be given a FIT score. These scores are then propagated to all other nodes.
operator - Boolean operator that each node applies to its inputs. All nodes except leaf nodes must be given an operator. Only strings 'And' and 'Or' will be parsed as valid operators. 'And' means a failure on either input will result in a failure of that system.

Fig.1 - Model of a simple system, derived from Embedded Software Development for Safety-Critical Systems, Chris Hobbs

The Solver

Our solver will generate for us the FIT% score for each node in the graph, as well as the minimum-cut sets. I'm using the pypi cutsets package for this, which derives all unique combinations of component failures that can cause system failure. We are interested in the minimum-cut sets, which are those that cannot be reduced without losing their status as a cut set. For example, if all nodes in the graph fail, that is technically a cut-set, but it is not particularly helpful. Rather, we want to know what are the smallest set of nodes that can fail, which cause the TOP contition to be true.

Pypi cutsets uses the Method of obtaining Cut Sets Updard (MOCUS) algorithm to generate the minimum-cut sets, introduced by N. Limnios and R. Ziani in their 1986 paper, Algorithms for Reducing Cut Sets in Fault Tree Analysis.

Results

Running the solver yields:

Minimum Cut Sets:
['PSU']
['SS_2C', 'SS_1B']
['SS_2C', 'SS_1A']
['SS_2D', 'SS_1B']
['SS_2D', 'SS_1A']
['SS_2A', 'SS_2B', 'SS_1B']
['SS_2A', 'SS_2B', 'SS_1A']

Node FIT% scores:
TOP - 0.0647
PSU - 0.05
SS_2 - 0.280
SS_1 - 0.231
SS_2A_2B - 2.50e-3
SS_1B - 0.1
SS_1A - 0.1
SS_2C_2D - 0.240
SS_2A - 0.05
SS_2B - 0.05
SS_2C - 0.05
SS_2D - 0.2

The Canonical Form of a System Design

We can use our calculated minimum-cut sets to describe what we call the Canonical Form of a system design. Such a representation yields every unique and irreducible combination of subsystem failures that result in system failure. The analyst can then model these subsystem failure combinations by constructing a sum-of-product structure function for any given left-to-right path in Figure 2. This is useful for modelling the system-level impact of redundancy and dependability at the sub-system level. In a future post we will get to the bottom of what redundancy actually means for a system-level design and how to model, with structure functions and other tools, the various ways of achieving redundancy. For now, we can be satisfied that we have reduced our system to its canonical form.

Fig.2 - System canonical form, derived from Embedded Software Development for Safety-Critical Systems, Chris Hobbs