On Complexity

Sep 02, 2022

The Curse Of Complexity

Software complexity takes a few different forms. A few (non-comprehensive) examples include:

Tight coupling between components
Subtle logic with difficult edge cases
Fragile ownership contracts

Anything which makes code or systems harder to understand, reason about, debug, modify or test contributes to complexity. The least possible complex system is always preferable. When developing software, we work with a limited complexity budget for each system and component. As the total complexity across our software increases we run into difficulties:

Bugs are more likely
New changes are harder to make
Onboarding developers takes longer
Operational issues are harder to resolve

The greater the complexity a system possesses, the fewer team members have the knowledge and ability to work with the system. Eventually systems reach a level of complexity debt beyond the available complexity and cognitive budget and bankruptcy is declared. After complexity bankruptcy is declared, either:

The system is re-written from scratch or
Massive refactoring is undertaken

Unfortunately the high complexity which drove the need for either of these approaches also conspires against each. On one hand, re-writing highly complex systems is fraught with danger. It’s easy to miss a nuance of the existing system in the re-write. On the other end, refactoring a complex system is full of danger since complex systems (especially with global rather than local complexity) often lack the sort of integration testing and regression coverage to prevent breakages. Subtle interactions are easy to overlook when refactoring, and weak testing coverage will miss regressions against subtle implicit contracts (see Hyrum’s Law).

Avoiding and keeping complexity in check is a constant on-going consideration in the development, maintenance and operation of software systems. Each change to a system at the architectural or component level should include complexity as a first order consideration. Let’s look at two particular dimensions we can qualitatively evaluate the impact of changes against:

Global vs. Local Complexity

One lens on complexity in software systems is global vs. local complexity. Some complexity is neatly abstracted inside a component boundary. Other forms of complexity increase the detail and nuance of interactions between components. Think quantum entanglement. Changes in one system causing spooky action at a distance on another component.

When changes to a component do not require corresponding changes to or knowledge of other components, that change is increasing (or decreasing) local complexity, rather than global complexity. Conversely, adding new APIs, changing interface parameters between components, or making assumptions/taking dependencies on internal implementation details of another component is a form of increased coupling, which creates global complexity in a system. Another example of global vs. local complexity is functional duplication. When functionality is duplicated across different components, changes to that common functionality must be replicated and kept in sync across multiple components. This represents global complexity since the developers must be aware of the existence of that duplication across multiple components.

Increasing local complexity is far preferable to global complexity. The object oriented principle of encapsulation exemplifies this notion. Local complexity is narrower in scope, and is easier to handle and manage. When complexity is kept within a component, it’s easier to reason about because of the narrower domain the complexity lives in. We humans can only hold so much mental context at once. If complex logic is kept in a single unit which a developer can keep in their head at one time, they have far more success at grokking complex logic.

Testing and verifying more complex logic contained within a local component is far easier because of clear boundaries, interfaces, and encapsulation of the complexity within a specific module. Unit tests typically suffice to test locally complex components, while integration tests are required for global interactions. Given the order of magnitude difference in implementation effort and execution time for integration vs. unit tests, the impact of keeping complexity local is significant.

Marginal Complexity Growth

Another aspect to consider is the reality that the rate of complexity growth is super linear in most cases. Consider the example of adding a new boolean parameter to an API. The increase in parameter count is linear. However, the number of unique parameter combinations is 2x the previous set of unique combinations. Those new parameter combinations must be properly tested. Unfortunately they often are not. I’ve personally caused outages in the past from changes which triggered bugs with specific boolean parameters enabled/disabled. The more “modes of operation” a system has, the larger the state/configuration space such a piece of software can traverse. Exhaustively testing the entire configuration/parameter space is often infeasible due to vast time required to enumerate all combinations.

For another example, consider branching. Each branch in a codebase creates ~2x the number of different execution paths through a section of code. While certain paths may not be initially taken, changes in other (potentially unrelated) areas of code may trigger new paths to be hit, or be executed with a different state which does trigger bugs. Code coverage metrics are deceptive here. Merely because a path was traversed doesn’t mean all aspects of the code were testing. Both control flow and data coverage is needed. Consider if a branch calls another function with a couple different parameters:

if (mySpecialFeature) {
  funcThatWillCrashWithZeroInput(variableThatMightBeZero);
}

The branch may have code coverage in test runs, and the called function may have its own test coverage in different tests. But the specific combination of this branch being executed + specific parameters being passed to the other function can trigger failures neither test would catch despite “100% coverage”.

Since complexity growth is super-linear, the marginal rate of complexity growth (similar to Marginal Utility) is lower for simple components than complex components. I.e., for each unit of new functionality/constraints added, adding such functionality/constraints to an already complex component will trigger a far larger absolute growth in complexity than adding to simple component. More concretely, from the examples above, 2x the complexity of a simple component is far preferable to 2x the complexity of a complex component. Every time.

Implications

Two major corollaries follow:

Complex module implementation with simple API semantics/interfaces (internal vs. external complexity) are preferable to simple module implementation with complex semantics/APIs
Increasing the complexity of a simple system/component is preferable to increasing the complexity of a complex system/component or updating multiple components/APIs

These two corollaries form a useful rubric for decision making when weighing design tradeoffs around managing complexity in a system, and allocating the available complexity budget for a system or design. The reality is complexity will creep into software and other systems over time, but managing complexity, and spending it wisely keeps systems in a manageable state for longer.

Delayed Branch

Discussion about this post