Failure is Not an Option

Design: What to do.  Where to do it.  When it must be done.


Methodology and goals of Design

Whenever one looks at a system, there are varying views of that system, each view being the thing of interest to the viewer.  So, for example, a house plan looks very different to a carpenter, an electrician, a plumber, and a heat and ventilation person.  And so it is when designing a computer system.  The network people see a whole bunch of black boxes connected by wires to switches, routers, firewalls, etc.  The computer people want to know about gigahertz, gigabytes or RAM, terabytes of storage; but since the network adapter is on the motherboard, it's not a concern.  The facilities people want to know about watts and BTUs/hr, and floor loadings, lighting.  The operators want to know how to fix it all when it breaks.  The bean counters want to know how much it's going to cost.  And your boss wants to know that 1) it's going to work and 2) it meets the requirements. The software engineers want to know which software goes where and what it has to do.  You must satisfy all these competing (and legitimate) demands for understanding what it is you want to do.

Why bother with design, why don't we just get to work and build the damm thing?  There are several reasons:

I would like to suggest that the best way to start the design process is with a data flow diagram (DFD).  Tom Demarco


Goals of a design

  1. Cost
  2. Performance
  3. Reliability
  4. Security
  5. Ease of accomodating change ( expansion and contraction) (new software)

Cost


Performance


Reliability


Security



Strategy

Our ordinary experience with modern computers is that they are pretty reliable.  Think about it: a Pentium IV CPU with a memory cycle time of 50 ns is doing 20 million memory operations a second and it is common (at least, in the linux world) for them to go for hundreds of days, roughly 1013 transactions, without a failure.  By any standard, that's pretty reliable.  Most of the time, when the computer does fail, it is a dammed nuisance but no big deal (you do do backups, don't you?)  So what is the problem?


Reality rears its ugly head

The discussion to the right made the explicit assumption that the two computers will fail independently. In fact, the computers may fail dependently, and my experience is that dependent failures are far more common. Why do computers fail dependently? Because they have things in common that can fail:


Failure resistant

It is possible to build systems that are failure resistant.  In other words, they don't fail.  Pfail is small.  There are several strategies for achieving failure resistance:

The problem is that computers don't fail in the same way that dams and bridges do, so it is hard to imagine ways of applying what works so well for civil engineers.  Furthermore, WHAT?

Fail safe

All around the world, railroads meet roads at grade.  It's the cheapest way to get the tracks to the other side of the road.  At low traffic railroad crossings, a couple of wooden or metal signs are sufficient.  At high traffic crossings, there are signs, paint on the roadway, lights at eye level, lights on a tower cantelevered over the road, barriers to prevent cars from crossing the tracks.  Some of these crossings are at very remote locations.  How do they work reliably?  There is a circuit in the track.  At the far end of the circuit is a power supply.  At the near end of the circuit is a voltage sensor.  If anything goes wrong with the circuit, such as a broken power supply, a broken wire, a broken rail, or a train, then the signals activate and stop traffic.  The system is fail safe.

Fail safe systems are wonderful things, but you can't always implement them.  Aircraft "fly by wire" systems simply cannot fail, because if they do, then the airplane will crash within seconds.

Failure tolerant

The wave of the future seems to be in inexpensive failure tolerant systems.  You know that your subsystems are going to fail, so you engineer your systems so that the failure of a subsystem will not cause a failure of the system.  If you have N redundant systems, each of which has a   Pfail which is small,   Psystem_fail =   Pfail N which is very small (if you need M systems to run, then Psystem_fail =   Pfail  N/M).  These technologies have names: RAID (Redundant Arrays of Inexpensive Disks), (VIPs) Virtual IP addresses, VLANs (Virtual Local Area Networks.

Achieving failure tolerance through redundancy

When having a discussion about computer systems, it helps to pick the appropriate level of abstraction for the discussion at hand.  This discussion of failure tolerance will go through decreasing abstraction.

Figure 1 shows the evolution of a failure tolerant system.Virtual IP addresses  The first illustration shows how a failure tolerant system appears to the customers.  Notice that they don't see any of the complicated stuff: they perceive a highly reliable system.  This is as it should be.

The second illustration shows a logical view of a failure tolerant system that provides a single service.  This is the view that the programmers see.  Why do the programmers see the switches, which are normally transparent?  Because the switches implement Network Address Translation (NAT).  The switches also have Access Control Lists (ACLs), which function as firewalls and limit the kinds of traffice that can get to (and from) the internet.  Note the Linux Virtual Servers (LVSs). 

    The third illustration shows a implementation view of several services implemented in the same system.  It is possible to amortize the cost of the infrastructure (the switches, the LVSes) over several services.  Switches and LVSes are remarkably fast compared to web servers and application servers, and there's no reason not to use them for more than one service.

All of these diagrams are single tier (or two-tier if you count the customer's system as a tier, which some writers do).  If you are implementing a multi-tier system, then you have to have redundancy not only for the front end but for the back end as well.  However, you can use the same LVS hardware to front the front end service but also the back end server, thereby presenting the appearance of a reliable back end to the front end.  Figure 2 shows a Data Flow Diagram of a multi-tier system.  Multi-tiered application data flow diagram   Figure 3 shows how this might look (in case you ever wondered why modern PCs have two ethernet ports).

multi-tiered application VIPs

This seems rather daunting, does it not?  Well, there are some tricks you can use.  Assuming that your applications are lightweight enough, you can combine multiple applications onto a single computer.  You do this by getting clever with IP addresses, Ports, Operating Systems, processes, images, and threads.   There are (at least) three ways to do it:

Single Operating System, Single IP address, multiple ports, multiple processes, single or multiple images, single threads.
In this approach, there are several UNIX processes, each with its own memory space.  Each image listens on its own port but shares an IP address with the other processes on a system.  If any single process fails, the other processes keep going.  Each process can get a full memory space, 4.3 GBytes on an 80386 class-computer.  The problem with having a single image over multiple ports is allocating work to the different processes.  If different front ends connect to different backends, that can solve the problem nicely.  Having the front end on different ports for different processes with the same image is problematical for the users.  But if each image, each application type, is on its own port, than this scheme can work quite nicely.  Incidentally, "single threads" means that each process has a single thread of execution.  But there can be a main application whose sole function is to listen for an inbound connection and then fork a child process when the connection arrives.  The main application loops and listens, while the child process does whatever needs to be done and then exits.  If the child process has an error, then that particular transaction dies but the main application process keeps going.
Single Operating System, Single IP address, multiple ports, multiple processes, single or multiple images, multiple threads
Modern UNIXes have the ability to spawn multiple threads of execution within a single process.  The threads share a single address space.  Keeping the memory seperated between the threads is a challenging proposition.  Advocates of threads, as opposed to processes, argue that threads require less attention from the OS than processes, and they're correct.  But in this modern age with very very fast computers, I don't that a compelling argument.
Single Operating System, Multiple IP addresses
Linux (check other operating systems) has the ability to bind several IP addresses to a single physical network connection through a mechanism called "IP Aliases".   Each application listens on an IP address appropriate for the service it is providing.  So, for example, one IP address can serve as a front end, and another as a backend, and still another for monitoring and yet another for logging.  Switches and hubs will work with this approach, since they work at the Ethernet (MAC) level.  The machines that talk with these machines do not realize that while the IP addresses are the same, the ethernet addresses are different.  But the switches that connect them together see Ethernet (or IEEE 802.3, not much difference here) packets and happily send those packets where they're supposed to go.
Multiple Operating Systems.
You can virtuallize the operating system (OS).  Each application, or group of applications can run in its own OS, which need not be the same OS as the "real" OS.  It is possible to have multiple virtual operating systems on the same machine.  VMware ® is a classic example.  Virtual Operating Systems can communicate with one another using TCP/IP, so it is possible to have a virtual network inside the physical machine.
Using some combination of these tricks, one can build a highly reliable system using two switches, 2 LVSes, 2 severs, and 2 database servers (if appropriate).





$Log: Design.html,v $
Revision 1.1.1.1  2006/10/01 23:36:22  cvsuser
Initial checkin to CVS

Revision 1.2  2006/01/05 08:06:17  jeffs
Added iframes for the statistics stuff, added more description to the failure avoidance strategies

Revision 1.1 2006/01/05 06:02:19 jeffs
Initial revision