Understanding Non-Functional Requirements in System Design

When designing complex systems, developers often focus primarily on functional requirements—what the system does. However, non-functional requirements are equally crucial in creating robust, efficient, and reliable software architectures. Let's dive deep into the world of non-functional requirements and why they matter.

What Are Non-Functional Requirements?

Unlike functional requirements that describe a system's inputs and outputs, non-functional requirements encompass the broader characteristics that determine a system's overall quality and performance. These requirements include:

  • Scalability: How easily the system can adjust to increased load

  • Availability: The percentage of time the system remains operational

  • Performance: Response times and system throughput

  • Fault-tolerance: The ability to continue functioning during component failures

  • Security: Protection against unauthorized access

  • Privacy: Controlling access to personally identifiable information

  • Consistency: Ensuring data matches across all system nodes

  • Cost-effectiveness: Balancing system capabilities with resource constraints

The Importance of Clarification

One critical insight is that customers rarely explicitly outline all non-functional requirements. They often assume these aspects will be naturally incorporated into the system design. This assumption can lead to:

  • Incomplete requirement specifications

  • Misunderstandings about system capabilities

  • Potential over-engineering or under-engineering of solutions

Key Non-Functional Requirement Considerations

Scalability: Handling Growth Strategically

Scalability is about efficiently expanding your system's capacity. There are two primary approaches:

  1. Vertical Scaling: Upgrading a single host's hardware

    • Pros: Conceptually simple

    • Cons:

      • Expensive at higher levels

      • Technological limitations

      • Potential downtime during upgrades

  2. Horizontal Scaling: Distributing load across multiple hosts

    • Preferred method in modern system design

    • Allows for more flexible and cost-effective growth

    • Essential for handling increased user loads and data volumes

Availability: Measuring System Uptime

Availability is typically measured in "nines":

  • 99.9% (three nines): ~8.77 hours of downtime per year

  • 99.99% (four nines): ~52.6 minutes of downtime per year

  • 99.999% (five nines): ~5.26 minutes of downtime per year

Fault Tolerance: Ensuring Resilience

Key strategies include:

  • Replication: Maintaining multiple redundant instances

  • Self-healing mechanisms

  • Graceful degradation when components fail

Fault Tolerance: Advanced Strategies

Beyond basic redundancy, system designers have several sophisticated techniques to enhance fault tolerance:

Circuit Breaker Pattern

A circuit breaker prevents cascading failures by temporarily stopping requests to a service that appears to be failing. When a service experiences repeated errors, the circuit breaker:

  • Stops making requests to the problematic service

  • Periodically tests if the service has recovered

  • Prevents overwhelming already stressed systems

Exponential Backoff and Retry

This strategy involves:

  • Incrementally increasing wait times between retry attempts

  • Adding "jitter" (small random variations) to prevent synchronized retry attempts

  • Gradually backing off to give systems time to recover

Caching and Graceful Degradation

When external services fail, intelligent systems can:

  • Cache previous successful responses

  • Provide default or placeholder data

  • Maintain partial functionality instead of complete failure

Checkpointing

Critical for long-running data processing jobs, checkpointing:

  • Saves progress at regular intervals

  • Allows systems to resume from the last successful state after a failure

  • Particularly useful in streaming and ETL (Extract, Transform, Load) pipelines

Dead Letter Queues

For handling failed write requests, dead letter queues:

  • Store requests that couldn't be processed

  • Allow for later retry or manual intervention

  • Prevent permanent data loss

Bulkhead Pattern

Inspired by maritime ship design, the bulkhead pattern:

  • Isolates system components

  • Prevents failures in one area from cascading

  • Allows partial system functionality during localized issues

Fallback Pattern: Resilience Through Alternative Strategies

The fallback pattern is a sophisticated approach to maintaining system reliability when primary services fail. Key characteristics include:

  • Detecting Service Problems: Identifying when a primary service is unavailable

  • Alternative Execution Paths: Providing backup methods to retrieve or process information

  • Flexible Response Handling: Offering cached or alternative data instead of complete failure

Practical Examples:

  • Caching previous responses for use during outages

  • Redirecting to alternative third-party services

  • Providing historical or placeholder data

Important Consideration: Always evaluate the reliability of your fallback mechanism, as the fallback itself could potentially fail.

Performance and Latency: Optimizing System Responsiveness

Performance is more than just speed—it's about how quickly and efficiently a system responds to user requests. Key insights include:

Understanding Latency

  • Definition: Total time from request initiation to response completion

  • Includes network travel time, processing time, and response transmission

  • Typical consumer app latency: tens of milliseconds to several seconds

  • Specialized systems (like high-frequency trading) may require millisecond-level responses

Strategies for Reducing Latency

Geographical Optimization

  • Deploy services close to user clusters

  • Use multiple data centers to minimize network distance

Technical Approaches

  • Implement Content Delivery Networks (CDNs)

  • Utilize caching mechanisms

  • Use efficient communication protocols (RPC over REST)

  • Employ batch and streaming techniques

Latency vs. Throughput

  • Latency: Time to complete a single request

  • Throughput: Number of requests processed in a given time

  • Lower latency often correlates with higher throughput

Design Considerations

  • Evaluate data characteristics and processing requirements

  • Consider asynchronous processing for non-critical tasks

  • Aggregate data during write operations to optimize read performance

Consistency: Balancing Data Integrity and Performance

Consistency in distributed systems involves complex trade-offs:

Types of Consistency

  1. ACID Consistency

    • Focuses on data relationship integrity

    • Ensures referential and uniqueness constraints

  2. CAP Theorem Consistency (Linearizability)

    • All nodes contain identical data at a given moment

    • Synchronous data changes across system

Consistency Strategies

  • Linearizability: Strict, real-time data synchronization

  • Eventual Consistency: Temporary data divergence with eventual alignment

  • Trade-offs: Balancing consistency with availability, scalability, and latency

Database Approaches

Some databases prioritize:

  • Linearizability: MongoDB, HBase, Redis

  • Availability: Cassandra, CouchDB, Dynamo

Tradeoffs and Considerations

Non-functional requirements are rarely achieved in isolation. They often involve strategic tradeoffs:

  • Improving availability might impact consistency

  • Enhancing security could increase latency

  • Increasing fault-tolerance might raise implementation complexity

Complexity and Maintainability

Minimizing complexity is crucial in system design. The first step is to clarify both functional and non-functional requirements to avoid designing for unnecessary features.

Common Generalizable Services

To reduce complexity and improve maintainability, consider implementing:

  • Load balancer service

  • Rate limiting

  • Authentication and authorization

  • Logging, monitoring, and alerting

  • TLS termination

  • Caching

  • DevOps and CI/CD (if applicable)

Managing System Complexity

  • Separate components into independent systems where possible

  • Consider trade-offs between complexity and system requirements

  • Use techniques like ETL pipelines to delay non-real-time data processing

  • Minimize message sizes using RPC serialization frameworks (Avro, Thrift, protobuf)

Continuous Deployment (CD)

Continuous deployment offers significant advantages:

  • Easy deployments and rollbacks

  • Fast feedback cycles

  • Improved system maintainability

  • Quick recovery from buggy builds

Deployment Techniques

  • Blue/green deployments (zero downtime)

  • Using static code analysis tools like SonarQube

  • Incremental feature rollouts

  • Rapid software development lifecycle

Cost Considerations in System Design

In system design, cost is a critical factor that often requires strategic trade-offs between different non-functional requirements:

Strategic Cost Trade-offs

  1. Scaling Strategies

    • Higher cost for lower complexity through vertical scaling

    • Lower costs by choosing horizontal scaling with more efficient resource utilization

  2. Availability vs. Cost

    • Reduce system redundancy to decrease expenses

    • Adjust replication factors in databases

    • Balance between system reliability and budget constraints

  3. Latency Trade-offs

    • Choose data center locations with lower operational costs

    • Accept slightly increased latency for significant cost savings

Monitoring and Maintenance Costs

  • Implement targeted monitoring aligned with system criticality

  • Avoid over-alerting to reduce unnecessary operational overhead

  • Create alert strategies that match the urgency of potential issues

Dependency Management

  • Anticipate future update requirements

  • Identify potential risks with third-party dependencies

  • Design systems with modularity to facilitate easier component replacement

System Decommissioning Considerations

  • Plan for potential system retirement

  • Develop data extraction strategies

  • Consider user data migration and preservation

Security in System Design

Key Security Considerations

  1. Encryption Strategies

    • TLS termination vs. encryption in transit

    • Determine appropriate encryption for data at rest

    • Implement robust access control mechanisms

  2. Authentication Protocols

    • Understand OAuth 2.0 and OpenID Connect

    • Implement rate limiting to prevent DDoS attacks

Privacy and Data Protection

Handling Personally Identifiable Information (PII)

  • Comply with regulations like GDPR and CCPA

  • Implement strong access control mechanisms

  • Use hashing algorithms to protect individual privacy

  • Develop comprehensive data retention and deletion policies

Internal vs. External Service Security

  • Adopt a default security-first approach

  • Implement role-based access controls

  • Log access to sensitive databases

  • Limit data exposure across services

Cloud Native Approaches

Cloud native technologies offer comprehensive solutions for addressing non-functional requirements:

  • Key Characteristics

    • Scalable applications in dynamic environments

    • Leveraging containers and microservices

    • Implementing immutable infrastructure

    • Utilizing declarative APIs

  • Benefits

    • Increased system resilience

    • Improved manageability

    • Enhanced observability

    • Enables frequent, predictable changes

Practical Advice for System Designers

  1. Always discuss non-functional requirements explicitly

  2. Understand the customer's true needs beyond surface-level requirements

  3. Consider multiple design approaches

  4. Be prepared to make informed compromises

  5. Prioritize requirements based on system context

Last updated