| Checkpoints:
 Software Architecture DocumentTopicsOverall, the system is soundly based
architecturally,
because:
 
  The architecture appears to be stable.
    The need for stability is dictated by the nature of the Construction
    phase: in Construction the project typically expands, adding developers who
    will work in parallel, communicating loosely with other developers as they
    produce the product. The degree of independence and parallelism needed in
    Construction simply cannot be achieved if the architecture is not stable. The importance of a stable architecture cannot be overstated. Do not be
    deceived into thinking that 'pretty close is good enough' - unstable is
    unstable, and it is better to get the architecture right and delay the onset
    of Construction rather than proceed. The coordination problems involved in
    trying to repair the architecture while developers are trying to build upon
    its foundation will easily erase any apparent benefits of accelerating the
    schedule. Changes to architecture during Construction have broad impact:
    they tend to be expensive, disruptive and demoralizing. The real difficulty of assessing architectural stability is that
    "you don't know what you don't know"; stability is measured
    relative to expected change. As a result, stability is essentially a
    subjective measure. We can, however, base this subjectivity on more than
    just conjecture. The architecture itself is developed by considering
    'architecturally significant' scenarios - sub-sets of use cases which
    represent the most technologically challenging behavior the system must
    support. Assessing the stability of the architecture involves ensuring that
    the architecture has broad coverage, to ensure that there will be no
    'surprises' in the architecture going forward. Past experience with the architecture can also be a good indicator: if
    the rate of change in the architecture is low, and remains low as new
    scenarios are covered, there is good reason to believe that the architecture
    is stabilizing. Conversely, if each new scenario causes changes in the
    architecture, it is still evolving and baselining is not yet warranted.The complexity of the system matches the
    functionality it provides.The conceptual complexity is appropriate given the skill and
    experience of its:
    usersoperatorsdevelopers The system has a single consistent, coherent architectureThe number and types of component is reasonableThe system has a consistent system-wide security
    facility.  All the security components work together to safeguard the
    system.The system will meet its availability targets.The architecture will permit the system to be recovered in the
    event of a failure within the required amount of time.The products and techniques on which the system is based match
    its expected life?
    
      An interim (tactical) system with a short life can safely
        be built using old technology because it will soon be discarded.A system with a long life expectancy (most systems) should
        be built on up-to-date technology and methods so it can be maintained
        and expanded to support future requirements.The architecture provides defines clear interfaces to enable
    partitioning for parallel team development.The designer of a model element can understand enough from the
    architecture to successfully design and develop the model element.The packaging approach reduces complexity and improves
    understanding.Packages have been defined to be highly cohesive within the
    package, while the packages themselves are loosely coupled.Similar solutions within the common application domain have
    been considered.The proposed solution can be easily understood by someone
    generally knowledgeable in the problem domain.All people on the team share the same view of the architecture
    as the one presented by the software architect.The Software Architecture Document is current.The Design Guidelines have been followed.All technical risks been either mitigated or have been
    addressed in a contingency plan. New risk discovered have been documented
    and analyzed for their potential impact.The key performance requirements (established budgets) have
    been satisfied.Test cases, test harnesses, and test configurations have been
    identified.The architecture does not appear to be
    "over-designed".
    
      The mechanisms in place appear to be simple enough to use.The number of mechanisms is modest and consistent with the
        scope of the system and the demands of the problem domain.All use-case realizations defined for the current iteration can
    be executed by the architecture, as demonstrated by diagrams depicting:
      Interactions between objects,Interactions between tasks and processes,Interaction between physical nodes. Overall
  
    Subsystem and package partitioning and layering is logically
      consistent.All analysis mechanisms have been identified and described. Subsystems
  
    The services (interfaces) of subsystems in upper-level layers
      have been defined.The dependencies between subsystems and packages correspond
      to dependency relationships between the contained classes.The classes in a subsystem support the services identified
      for the subsystem. Classes
  
    The key entity classes and their relationships have been
      identified.Relationships between key entity classes have been defined.The name and description of each class clearly reflects the
      role it plays.The description of each class accurately captures the
      responsibilities of the class.The entity classes have been mapped to analysis mechanisms
      where appropriate.The role names of aggregations and associations accurately
      describe the relationship between the related classes.The multiplicities of the relationships are correct.The key entity classes and their relationships are consistent
      with the business model (if it exists), domain model (if it exists),
      requirements, and glossary entries. 
  
    The model is at an appropriate level of detail given the
      model objectives.For the business model, requirements model or the design
      model during the elaboration phase, there is not an over-emphasis on
      implementation issues.For the design model in the construction phase, there is a
      good balance of functionality across the model elements, using composition
      of relatively simple elements to build a more complex design.The model demonstrates familiarity and competence with the
      full breadth of modeling concepts applicable to the problem domain;
      modeling techniques are used appropriately for the problem at hand.Concepts are modeled in the simplest way possible.The model is easily evolved; expected changes can be easily
      accommodated.At the same time, the model has not been overly structured to
      handle unlikely change, at the expense of simplicity and
      comprehensibility.The key assumptions behind the model are documented and
      visible to reviewers of the model. If the assumptions are applicable to a
      given iteration, then the model should be able to be evolved within those
      assumptions, but not necessarily outside of those assumptions. Documenting
      assumptions is a way of indemnifying designers from not looking at
      "all" possible requirements. In an iterative process, it is
      impossible to analyze all possible requirements, and to define a model
      which will handle every future requirement. 
  
    The purpose of the diagram is clearly stated and easily
      understood.The graphical layout is clean and clearly conveys the
      intended information.The diagram conveys just enough to accomplish its objective,
      but no more.Encapsulation is effectively used to hide detail and improve
      clarity.Abstraction is effectively used to hide detail and improve
      clarity.Placement of model elements effectively conveys
      relationships; similar or closely coupled elements are grouped together.Relationships among model elements are easy to understand.Labeling of model elements contributes to understanding. 
  
    Each model element has a distinct purpose.There are no superfluous model elements; each one plays an
      essential role in the system. 
  
    For each error or exception, a policy defines how the system
      is restored to a "normal" state.For each possible type of input error from the user or wrong
      data from external systems, a policy defines how the system is restored to
      a "normal" state.There is a consistently applied policy for handling
      exceptional situations.There is a consistently applied policy for handling data
      corruption in the database.There is a consistently applied policy for handling database
      unavailability, including whether data can still be entered into the
      system and stored later.If data is exchanged between systems, there is a policy for
      how systems synchronize their views of the data.In the system utilizes redundant processors or nodes to
      provide fault tolerance or high availability, there is a strategy for
      ensuring that no two processors or nodes can ‘think’ that they are
      primary, or that no processor or node is primary.The failure modes for a distributed system have been
      identified and strategies defined for handling the failures. 
  
    The process for upgrading an existing system without loss of
      data or operational capability is defined and has been tested.The process for converting data used by previous releases is
      defined and has been tested.The amount of time and resources required to upgrade or
      install the product is well-understood and documented.The functionality of the system can be activated one use case
      at a time. 
  
    Disk space can be reorganized or recovered while the system
      is running.The responsibilities and procedures for system configuration
      have been identified and documented.Access to the operating system or administration functions is
      restricted.Licensing requirements are satisfied.Diagnostics routines can be run while the system is running.The system monitors operational performance itself (e.g.
      capacity threshold, critical performance threshold, resource exhaustion).
      
        The actions taken when thresholds are reached are
          defined.The alarm handling policy is defined.The alarm handling mechanism is defined and has been
          prototyped and tested.The alarm handling mechanism can be ‘tuned’ to
          prevent false or redundant alarms.The policies and procedures for network (LAN, WAN) monitoring
      and administration are defined.Faults on the network can be isolated.There is an event tracing facility that can enabled to aid in
      troubleshooting.
      
        The overhead of the facility is understood.The administration staff possesses the knowledge to use
          the facility effectively.It is not possible for a malicious user to:
      
        enter the system.destroy critical data.consume all resources. 
  See Checkpoints: Workload Analysis Document 
  
    Memory budgets for the application have been defined.Actions have been taken to detect and prevent memory leaks.There is a consistently applied policy defining how the
      virtual memory system is used, monitored and tuned. 
  
    The actual number of lines of code developed thus far agrees
      with the estimated lines of code at the current milestone.The estimation assumptions have been reviewed and remain
      valid.Cost and schedule estimates have been re-computed using the
      most recent actual project experience and productivity performance. 
  
    Portability requirements have been met.Programming Guidelines provide specific guidance on creating
      portable code.Design Guidelines provide specific guidance on designing
      portable applications.A 'test port' has been done to verify portability claims. 
  
    Measures of quality (MTBF, number of outstanding defects,
      etc.) have been met.The architecture provides for recovery in the event of
      disaster or system failure 
  
    Security requirements have been met. 
  
    Are the teams well-structured? Are responsibilities
      well-partitioned between teams?Are there political, organizational or administrative issues
      that restrict the effectiveness of the teams?Are there personality conflicts? The Logical View section of the Software Architecture Document:
 
  
    accurately and completely presents an overview of the
      architecturally significant elements of the design.presents the complete set of architectural mechanisms used in
      the design along with the rationale used in their selection.presents the layering of the design, along with the rationale
      used to partition the layers.presents any frameworks or patterns used in the design, along
      with the rationale used to select the patterns or frameworks.The number of architecturally significant model elements is
      proportionate to the size and scope of the system, and is of a size which
      still renders the major concepts at work in the system understandable. Topics
  
    Potential race conditions (process competition for critical
      resources) have been identified and avoidance and resolution strategies
      have been defined.There is a defined strategy for handling "I/O queue
      full" or "buffer full" conditions.The system monitors itself (capacity threshold, critical
      performance threshold, resource exhaustion) and is capable of taking
      corrective action when a problem is detected. 
  
    Response time requirements for each message have been
      identified.There is a diagnostic mode for the system which allows
      message response times to be measured.The nominal and maximal performance requirements for
      important operations have been specified.There are a set of performance tests capable of measuring
      whether performance requirements have been met.The performance tests cover the "extra-normal"
      behavior of the system (startup and shutdown, alternate and exceptional
      flows of events of the use cases, system failure modes).Architectural weaknesses creating the potential for
      performance bottlenecks have been identified. Particular emphasis has been
      given to:
      Use of some finite shared resource such as (but not limited
        to) semaphores, file handles, locks, latches, shared memory, etc.inter-process communication. Communication across process
        boundaries is always more expensive than in-process communication.inter-processor communication. Communication across process
        boundaries is always more expensive than inter-process communication.physical and virtual memory usage; the point at which the
        system runs out of physical memory and starts using virtual memory is a
        point at which performance usually drops precipitously. 
  
    Where there are primary and backup processes, the potential
      for more than one process believing that it is primary (or no process
      believing that it is primary) has been considered and specific design
      actions have been taken to resolve the conflict.There are external processes that will restore the system to
      a consistent state when an event like a process failure leaves the system
      in an inconsistent state.The system tolerant of errors and exceptions, such that when
      an error or exception occurs, the system can revert to a consistent state.Diagnostic tests can be executed while the system is running.The system can be upgraded (hardware, software) while it is
      running, if required.There is a consistent policy for handling alarms in the
      system, and the policy has been consistently applied. The alarm policy
      addresses:
      the "sensitivity" of the alarm reporting
        mechanism;the prevention of false or redundant alarms;the training and user interface requirements of staff who
        will use the alarm reporting mechanism. The performance impact (process cycles, memory, etc.) of the
      alarm reporting mechanism has been assessed and falls within acceptable
      performance thresholds as established in the performance requirements.The workload/performance requirements have been examined and
      have been satisfied. In the case where the performance requirements are
      unrealistic, they have been re-negotiated.Memory budgets, to the extent that they exist, have been
      identified and the software has been verified to meet those requirements.
      Measures have been taken to detect and prevent memory leaks.A policy exists for use of the virtual memory system,
      including how to monitor and tune its usage. 
  
    Processes are sufficiently independent of one another that
      they can be distributed across processors or nodes when required.Processes which must remain co-located (because of
      performance and throughput requirements, or the inter-process
      communication mechanism (e.g. semaphores or shared memory)) have been
      identified, and the impact of not being able to distribute this workload
      has been taken into consideration.Messages which can be made asynchronous, so that they can be
      processed when resources are more available, have been identified. 
  
    The throughput requirements have been satisfied by the
      distribution of processing across nodes, and potential performance
      bottlenecks have been addressed.Where information is distributed and potentially replicated
      across several nodes, information integrity is ensured.Requirements for reliable transport of messages, such that
      they exist, have been satisfied.Requirements for secure transport of messages, such that they
      exist, have been satisfied.Processing has been distributed across nodes in such a way
      that network traffic and response time have been minimized subject to
      consistency and resource constraints.System availability requirements, to the extent that they
      exist, have been satisfied.
      
        The maximum system down-time in the event of a server or
          network failure has been determined and is within acceptable limits as
          defined by the requirements.Redundant and stand-by servers have been defined in such
          a way that it is not possible for more than one server to be
          designated as the "primary" server.All potential failure modes have been documented.Faults in the network can be isolated, diagnosed and
      resolved.The amount of "headroom" in the CPU utilization has
      been identified, and the method of measurement has been definedThere is a stated policy for the actions to be taken when the
      maximum CPU utilization is exceeded.   
Copyright 
© 1987 - 2001 Rational Software Corporation
 |  | 
 
   |