Resilient to errors

Systems should be resilient to all errors. Resiliency should be such that errors are handled gracefully and not crash the entire system. A resilient architecture can be achieved by the following:

  • Replication in order to make sure that there's a replica in case the main node goes down. This avoids single points of failure. In order to make sure that components or services should delegate services between them in such a way that single responsibility is handled.
  • Ensuring containment and isolation in the system so that the component is contained in its boundaries. It should prevent cascading errors. The client of a component is not burdened with handling its own failures.