Health Checks and Resilience (Polly)
Mind Map Summary
- Health Checks
- What: A dedicated endpoint in your API (e.g.,
/health
) that reports the application’s status. - Purpose: To allow external monitoring systems (like load balancers or container orchestrators) to automatically determine if the application instance is healthy and able to handle requests.
- Health Statuses:
Healthy
: The application and its critical dependencies are working.Degraded
: A non-critical dependency is failing, but the app can still function.Unhealthy
: A critical dependency is failing. The app should be taken out of service or restarted.
- Setup: Configured in
Program.cs
usingbuilder.Services.AddHealthChecks()
.
- What: A dedicated endpoint in your API (e.g.,
- Resilience (Polly)
- What: A .NET library for handling transient (temporary) failures and making applications more robust.
- Core Idea: Wrap your operations (like API calls) in policies that define how to handle specific exceptions or HTTP responses.
- Common Resilience Policies:
- Retry: Automatically retries a failed operation. Essential for handling temporary network glitches.
- Circuit Breaker: After a certain number of failures, the “circuit opens,” and further calls fail immediately for a set period. This prevents an application from hammering a struggling downstream service.
- Timeout: Enforces a time limit on an operation.
- Fallback: Provides a default value or action if an operation fails.
- Integration: Polly integrates seamlessly with
HttpClientFactory
to apply policies to outgoing HTTP requests.
Core Concepts
1. Health Checks
In modern, distributed systems, automated monitoring is essential. A health check endpoint provides a simple, standardized way for other systems to ask, “Are you okay?”.
- Liveness Probes: A simple check to see if the application process is running and responsive. If this fails, the orchestrator (like Kubernetes) might restart the container.
- Readiness Probes: A more thorough check that includes verifying dependencies (databases, downstream APIs, etc.). If this fails, the orchestrator will not restart the instance, but it will stop sending it new traffic until it becomes ready again. ASP.NET Core provides a rich framework for this. You can add checks for databases, message queues, or any custom dependency, and it will aggregate the results into a single health report.
2. Resilience and Polly
Distributed systems will inevitably experience transient failures. Networks are unreliable, services can be temporarily overloaded, and databases can have momentary hiccups. A resilient application is one that can anticipate and gracefully handle these failures.
Polly is the de-facto library for resilience in .NET. It allows you to declaratively express resilience strategies.
- Retry: The simplest and most common policy. If an HTTP call fails with a
503 Service Unavailable
, you can configure Polly to wait a second and try again, perhaps up to three times. This handles most transient faults. - Circuit Breaker: This is a critical pattern for preventing cascading failures. If a downstream service is failing, repeatedly retrying will only make the problem worse. The Circuit Breaker acts as a proxy. After a few failures, it “opens” the circuit and immediately fails any new requests without even trying to call the downstream service. After a configured delay, it enters a “Half-Open” state, allowing a single test request through. If it succeeds, the circuit closes and normal operation resumes. If it fails, the circuit remains open.
Practice Exercise
Configure health checks for your API. Add a check for a dependency, like a database connection. Then, use Polly to wrap an HttpClient
call in your service layer with a Retry policy to handle transient network failures.
Answer
First, you need to add the required NuGet packages:
dotnet add package Microsoft.Extensions.Http.Polly
dotnet add package AspNetCore.HealthChecks.SqlServer
(or the appropriate package for your database)
Code Example
1. Program.cs
- Configure Health Checks and Polly
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Polly;
using Polly.Extensions.Http;
using System;
using System.Net.Http;
var builder = WebApplication.CreateBuilder(args);
var services = builder.Services;
var configuration = builder.Configuration;
// --- Polly Retry Policy Setup ---
var retryPolicy = HttpPolicyExtensions
.HandleTransientHttpError() // Handles 5xx status codes, 408, and HttpRequestException
.OrResult(msg => msg.StatusCode == System.Net.HttpStatusCode.NotFound) // Also retry on 404
.WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))); // Exponential backoff
// --- HttpClientFactory with Polly ---
services.AddHttpClient("MyApiClient", client =>
{
client.BaseAddress = new Uri("https://api.example.com/");
})
.AddPolicyHandler(retryPolicy);
// --- Health Check Setup ---
services.AddHealthChecks()
// Add a health check for a SQL Server database
.AddSqlServer(
connectionString: configuration.GetConnectionString("DefaultConnection"),
healthQuery: "SELECT 1;",
name: "sql-database-check",
failureStatus: Microsoft.Extensions.Diagnostics.HealthChecks.HealthStatus.Unhealthy);
services.AddControllers();
var app = builder.Build();
// Map the health check endpoint
app.MapHealthChecks("/health");
app.MapControllers();
app.Run();
2. A Service That Uses the Resilient HttpClient
// MyService.cs
public class MyService
{
private readonly IHttpClientFactory _httpClientFactory;
public MyService(IHttpClientFactory httpClientFactory)
{
_httpClientFactory = httpClientFactory;
}
public async Task<string> GetDataFromApi()
{
var client = _httpClientFactory.CreateClient("MyApiClient");
// This call is now wrapped in the Polly retry policy
var response = await client.GetStringAsync("data");
return response;
}
}
Explanation
- Polly Policy: We define a
retryPolicy
usingHttpPolicyExtensions.HandleTransientHttpError()
, which is a pre-configured helper that handles typical transient HTTP errors. We also add a condition to retry on404 Not Found
. The policy is configured to retry 3 times with an exponential backoff delay (e.g., wait 2s, then 4s, then 8s), which is a best practice to avoid overwhelming a struggling service. - HttpClientFactory Integration: We register a named
HttpClient
called"MyApiClient"
. Crucially, we use.AddPolicyHandler(retryPolicy)
to attach our Polly policy to every request made by this client. This is the seamless integration point betweenHttpClientFactory
and Polly. - Health Check Configuration:
- We call
services.AddHealthChecks()
to initialize the system. - We then add a specific check for our SQL Server database using
AddSqlServer
. We provide the connection string and a simple query (SELECT 1
) to execute. If the query succeeds, the dependency is consideredHealthy
. If it fails, it will be reported asUnhealthy
.
- We call
- Mapping the Endpoint:
app.MapHealthChecks("/health")
exposes the health check endpoint. When you navigate to/health
, the framework will execute all registered checks and return a consolidated status report (e.g., a200 OK
with the text “Healthy” or a503 Service Unavailable
with the text “Unhealthy”). - Service Consumption: In
MyService
, we simply request anHttpClient
from theIHttpClientFactory
using the name"MyApiClient"
. The factory provides us with a pre-configured client that automatically has the retry policy applied. Our service code remains clean and unaware of the resilience logic, which has been neatly handled in the startup configuration.