Building Resilient .NET Applications using Polly

In distributed systems, outages and transient errors are inevitable. Ensuring that your application stays responsive when a dependent service goes down is critical. This article explores service resilience using Polly, a .NET library that helps handle faults gracefully. It covers basic resilience strategies and explains how to keep your service running when a dependency is unavailable.

What Is Service Resilience

Service resilience is the ability of an application to continue operating despite failures such as network issues, temporary service unavailability, or unexpected exceptions. A resilient service degrades gracefully rather than crashing outright, ensuring users receive the best possible experience even during failures.

Key aspects of resilience include:

  • Retrying Failed Operations automatically attempts an operation again when a transient error occurs.
  • Breaking the Circuit prevents a system from continuously attempting operations that are likely to fail.
  • Falling Back provides an alternative response or functionality when a dependent service is unavailable.

Introducing Polly: The .NET Resilience Library

Polly is an open-source library for .NET that simplifies resilience strategies. Polly allows defining policies to handle transient faults, combining strategies into policy wraps, and integrating them into applications via dependency injection.

Polly provides several resilience strategies:

  • Retry automatically reattempts operations when failures occur.
  • Circuit Breaker stops attempts temporarily if failures exceed a threshold.
  • Fallback provides a default value or action when all retries fail.
  • Timeout cancels operations that take too long.

These strategies can be combined to build a robust resilience pipeline.

Key Polly Strategies for Service Resilience

Retry Policy

The retry policy is useful when failures are transient. Polly can automatically re-execute failed operations after a configurable delay. Example:

var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
    .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)),
        onRetry: (outcome, timespan, retryCount, context) =>
        {
            Console.WriteLine($"Retry {retryCount}: waiting {timespan} before next attempt.");
        });

Circuit Breaker

A circuit breaker prevents an application from continuously retrying an operation that is likely to fail, protecting it from cascading failures. Example:

var circuitBreakerPolicy = Policy
    .Handle<HttpRequestException>()
    .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
    .CircuitBreakerAsync(
        handledEventsAllowedBeforeBreaking: 3,
        durationOfBreak: TimeSpan.FromSeconds(30),
        onBreak: (outcome, breakDelay) =>
        {
            Console.WriteLine("Circuit breaker opened.");
        },
        onReset: () =>
        {
            Console.WriteLine("Circuit breaker reset.");
        });

Fallback Strategy: Keeping Your Service Running

When a dependent service is down, a fallback policy provides a default or cached response instead of propagating an error. Example:

var fallbackPolicy = Policy<HttpResponseMessage>
    .Handle<HttpRequestException>()
    .OrResult(r => !r.IsSuccessStatusCode)
    .FallbackAsync(
         fallbackAction: cancellationToken => Task.FromResult(
             new HttpResponseMessage(HttpStatusCode.OK)
             {
                 Content = new StringContent("Service temporarily unavailable. Please try again later.")
             }),
         onFallbackAsync: (outcome, context) =>
         {
             Console.WriteLine("Fallback executed: dependent service is down.");
             return Task.CompletedTask;
         });

Timeout Policy

A timeout policy ensures that long-running requests do not block system resources indefinitely. Example:

var timeoutPolicy = Policy.TimeoutAsync<HttpResponseMessage>(TimeSpan.FromSeconds(10));

Implementing Basic Service Resilience with Polly

Example Use Case: Online Payment Processing System

Imagine an e-commerce platform, ShopEase, which processes customer payments through an external payment gateway. To ensure a seamless shopping experience, ShopEase implements the following resilience strategies:

  • Retry Policy: If the payment gateway experiences transient network issues, ShopEase retries the request automatically before failing.
  • Circuit Breaker: If the payment gateway goes down for an extended period, the circuit breaker prevents continuous failed attempts.
  • Fallback Policy: If the gateway is unavailable, ShopEase allows customers to save their cart and receive a notification when payment is available.
  • Timeout Policy: If the payment gateway takes too long to respond, ShopEase cancels the request and notifies the customer.

By integrating these resilience patterns, ShopEase ensures a robust payment processing system that enhances customer trust and maintains operational efficiency, even when external services face issues.

Conclusion

Building resilient services means designing systems that remain robust under pressure. Polly enables implementing retries, circuit breakers, timeouts, and fallback strategies to keep services running even when dependencies fail. This improves the user experience and enhances overall application reliability.

I advocate for 12-Factor Apps (https://12factor.net/) and while resilience is not directly a part of the 12-Factor methodology, many of its principles support resilience indirectly. For truly resilient applications, a combination of strategies like Polly for .NET, Kubernetes auto-recovery, and chaos engineering should be incorporated. Encouraging 12-Factor principles, auto-recovery, auto-scaling, and other methods ensures services remain resilient and performant.

By applying these techniques, developers can create resilient architectures that gracefully handle failure scenarios while maintaining consistent functionality for users. Implement Polly and supporting resilience strategies to ensure applications stay operational despite unexpected failures.

Leave a comment