An introduction to shared-clock schedulers:Why additional processors may not always improve reliability

Why additional processors may not always improve reliability

It is very important to appreciate that – without due care – increasing the numbers of processors in a network can have a detrimental impact on overall system reliability.

It is not difficult to see why this is the case. For example, we will ignore the possibility of failures in the links between processors and the need for a more complex (software) operating environment. Instead, we will assume that a network has 100 microcontrollers and that each of these devices is 99.99% reliable. As a result, a multiprocessor application which relies on the correct, simultaneous operation of all 100 nodes will have an overall reliability of 99.99% × 99.99% × 99.99% … This is 0.9999100, or approximately 37%. This is a huge decrease in reliability: a 99.99% reli- able device might be assumed to fail once in 10,000 years, while the corresponding 37% reliable device would then be expected to fail approximately every 18 months.4

It is only where the increase in reliability resulting from the shared-clock design outweighs the reduction in reliability known to arise from the increased system complexity that an overall increase in system reliability will be obtained. Unfortunately, making predictions about the costs and benefits (in reliability terms) of any complex design feature remains – in most non-trivial systems – something of a black art.

For example, consider the use of ‘redundant nodes’ as discussed earlier. Specifically, suppose we are developing an automotive cruise-control system (Figure 25.12).

image

The cruise-control application has clear safety implications: if the application suddenly fails and sets the car at full throttle, fatalities may result. As a result, we may wish to use two microcontroller-based nodes in order to provide a backup unit in the event that the first node fails (Figure 25.13).

This can be an effective design solution: for example, if we have a network with two essentially identical nodes and we are able to activate the second node when the first one fails then it seems likely that this will improve the overall system reliability. In effect, this is the approach used to good effect in many aircraft flight-control appli- cations where the ‘main’, ‘backup’ and ‘limp home’ controllers may be switched in, as required, by the pilot or co-pilot (e.g. Storey, 1996).

clip_image007However, the mere presence of redundant networks does not itself guarantee increased reliability. For example, in 1974, in a Turkish Airlines DC-10 aircraft, the cargo door opened at high altitude. This event caused the cargo hold to depressurize, which in

image

turn caused the cabin floor to collapse. The aircraft contained two (redundant) control lines, in addition to the main control system – but all three lines were under the cabin floor. Control of the aircraft was therefore lost and it crashed outside Paris, killing 346 people (Bignell and Fortune, 1984, pp. 143–4; Leveson, 1995, pp. 50 and 434).

In addition, in many embedded applications, there is either no human operator in attendance or the time available to switch over to a backup node (or network) is too small to make human intervention possible. In these circumstances, if the compo- nent required to detect the failure of the main node and switch in the backup node is complicated (as often proves to be the case), then this ‘switch’ component may itself be the source of severe reliability problems (see Leveson, 1995).

Note that these comments should not be taken to mean that multiprocessor designs are inappropriate for use in high-reliability applications. Multiple processors can be (and are) safely used in such circumstances. However, all multiprocessor developments must be approached with caution and must be subject to particularly rigorous design, review and testing.

Conclusions

In this chapter we have considered some of the advantages and disadvantages that can result from the use of multiple processors. We also introduced the shared-clock scheduler and sought to demonstrate that this operating environment may be used to create efficient time-triggered applications involving two or more microcontrollers.

We will provide detailed descriptions of a range of shared-clock schedulers in the chapters that follow.

Leave a comment

Your email address will not be published. Required fields are marked *