Role-based access control

CritterWatch ships with an auth-agnostic, ClaimsPrincipal-based RBAC layer. It governs who can view what and — more importantly — who can take state-changing actions (DLQ replay, projection rebuild, tenant ops, chaos toggles, listener pause / restart, etc.) across every monitored service.

Off-mode is the default. Single-tenant deployments and existing hosts that haven't configured RBAC keep working exactly as they did before — every authenticated caller (including anonymous in dev) can do everything the license permits. Enforced mode is a single DI call away.

How CritterWatch decides

CritterWatch never owns identity. The host's authentication layer (OIDC, reverse-proxy header trust, JWT bearer, Windows auth, whatever) builds a ClaimsPrincipal; CritterWatch reads it. A small interface decides what that principal is allowed to do:

csharp

public interface ICritterWatchAuthorizer
{
    Task<bool> IsAllowedAsync(
        ClaimsPrincipal principal,
        string capability,
        string? resource = null,
        CancellationToken ct = default);
}

capability — one of the Capabilities string constants. Identifies the thing the caller is trying to do.
resource — optional scope: the target service name, projection shard, tenant id, alert stream id, etc. Lets you write rules like "this on-call rotation can clear alerts on TripService but not RepairShop."

You implement this interface against whatever rule store fits your environment — LDAP groups, OIDC role claims, a static config file, a database, a custom rule engine. CritterWatch never assumes a schema.

Turning enforcement on

Register your authorizer alongside the rest of the CritterWatch services. The DI extension replaces the off-mode DefaultAllowAuthorizer:

csharp

builder.Services.AddCritterWatchAuthorization<MyAuthorizer>();

That single line lights up enforcement everywhere CritterWatch ships gates today — HTTP endpoints, MCP action tools, all paths annotated with [RequiresPermission]. No further wiring is needed; the off-mode shim is removed and your authorizer drives every decision.

For incremental rollout (e.g. you want to ship enforced mode but double-check denies against logs first), wrap your authorizer to log-and-allow during a soak window before flipping to log-and-deny.

Off-mode is preserved when no custom authorizer is registered.AddCritterWatchServices(...) auto-wires AddCritterWatchAuthorization() with TryAdd so the BFF's static codegen always resolves an authorizer; the resulting DefaultAllowAuthorizer returns true for every decision, which is the off-mode behaviour. Your explicit AddCritterWatchAuthorization<TAuthorizer>() replaces it.

Fail-closed on missing principal

If enforced mode is on and a request arrives without an authenticated principal (principal?.Identity?.IsAuthenticated != true), CritterWatch rejects the request before calling your authorizer. That short-circuit exists so a misconfigured authentication scheme can't accidentally let unauthenticated callers through against a permissive authorizer.

Off-mode (no custom authorizer registered) keeps allowing anonymous principals — that's the off-mode behaviour for single-tenant dev hosts.

What's gated, and how

HTTP endpoints

State-changing operator HTTP endpoints are annotated with the [RequiresPermission] attribute. Wolverine's HTTP codegen weaves the RBAC check into the chain ahead of the endpoint body:

csharp

public static class AlertEndpoint
{
    [RequiresPermission(Capabilities.AlertClear)]
    [WolverinePost("/api/critterwatch/alerts/{alertStreamId}/clear")]
    public static async Task<AlertClearedMessage?> ClearAlert(...)
    {
        // body runs only if the authorizer said yes
    }
}

On deny the request short-circuits with a 403 Forbidden ProblemDetails response. The denied capability is exposed in the capability extension field so clients can map the response back to a specific missing grant:

json

{
  "status": 403,
  "title": "Forbidden",
  "detail": "Caller is not authorized for capability \"alert.clear\".",
  "capability": "alert.clear"
}

No UseExceptionHandler ceremony is required on the host — the frame writes the response directly.

SignalR-routed commands

Operator commands invoked through the SignalR hub (the SPA's normal path for buttons like Rebuild projection, Clear alert, Add tenant) go through the same [RequiresPermission] enforcement as the matching HTTP endpoint. The handler reads the operator's ClaimsPrincipal off the SignalR connection — Wolverine 6.2+ surfaces it as SignalREnvelope.Principal (a ClaimsPrincipal? captured from HubCallerContext.User), and the RBAC middleware on the inbound relay path consumes it directly:

The same authorizer fires. ICritterWatchAuthorizer.IsAllowedAsync(...) receives the SignalR caller's principal, the same capability string that the matching HTTP endpoint guards, and the same resource scope (service name, plus {serviceName}:{tenantId} for per-tenant projection commands — see Multi-tenancy → Per-tenant scoping).
On deny the SignalR message is dropped with an envelope mirroring the HTTP 403 ProblemDetails shape — the SPA's notification surface renders it as a toast, and the audit log records the denial.
No connection-bridge plumbing is needed. The captured principal rides the envelope, so the off-mode DefaultAllowAuthorizer path and the production-mode ICritterWatchAuthorizer path use the same call shape; integrators don't need their own claim-forwarding shim.

If you're wiring a custom SignalR client (a third-party admin tool, an audit bot), pass the bearer token / cookie on the SignalR connection the same way you would for the HTTP API. The hub's authentication scheme governs principal acquisition; everything downstream is identical.

MCP action tools

The cross-application MCP server (CritterWatch.Mcp) exposes 21 RBAC-gated state-changing tools across six families:

Family	Tools	Capabilities
DLQ	Replay / Discard	`dlq.replay`, `dlq.discard`
Projection	Pause / Restart / Rebuild	`projection.pause`, `projection.restart`, `projection.rebuild`
Tenant	Add / Enable / Disable / Remove / HardDelete	`tenant.add`, `tenant.enable`, `tenant.disable`, `tenant.remove`, `tenant.hard-delete`
Alert	Acknowledge / Snooze / Clear	`alert.acknowledge`, `alert.snooze`, `alert.clear`
ChaosMonkey	Enable / Disable / SetFailureRate / SetSlowHandler / SetProjectionFailureRate	`chaos-monkey.toggle` (enable/disable), `chaos-monkey.configure` (rate / delay knobs)
Listener	Pause / Restart / Drain	`listener.pause`, `listener.restart`, `listener.drain`

Every tool calls the same enforcement helper before publishing the underlying command — the deny envelope is stable JSON the MCP client returns verbatim:

json

{
  "error": "Forbidden",
  "message": "Caller is not authorized for capability 'dlq.replay' on resource 'TripService'.",
  "capability": "dlq.replay",
  "resource": "TripService"
}

The MCP transport is configured stateless so IHttpContextAccessor reflects the current tool invocation's principal — see the MCP integration page for details.

The capability catalog

Strings, not enums — keep the wire format stable across audit-log replay. Add new capabilities at the bottom of Capabilities; never renumber or repurpose existing ones.

Read surfaces (17)

dashboard.view, services.view, projections.view, event-store-explorer.view, event-modeling.view, projection-stepper.view, alerts.view, metrics.view, health.view, audit-log.view, scheduled-messages.view, dead-letters.view, tenants.view, listeners.view, timeline.view, topology.view, durability.view.

MCP tool family gates (4)

mcp.alerts.read, mcp.health.read, mcp.performance.read, mcp.traces.read.

Operator actions (30)

Group	Capabilities
DLQ	`dlq.replay`, `dlq.discard`, `dlq.edit`
Listeners	`listener.pause`, `listener.restart`, `listener.drain`, `endpoint.update`
Projections	`projection.pause`, `projection.restart`, `projection.rebuild`, `subscription.rewind`
Scheduled messages	`scheduled-message.cancel`, `scheduled-message.reschedule`, `scheduled-message.edit`
Tenants	`tenant.add`, `tenant.enable`, `tenant.disable`, `tenant.remove`, `tenant.hard-delete`
Cluster / agents	`node.eject`, `agent.pin`, `agent.unpin`, `election.trigger`
Alerts	`alert.acknowledge`, `alert.snooze`, `alert.clear`, `alert.config.edit`
ChaosMonkey	`chaos-monkey.toggle`, `chaos-monkey.configure`
Configuration	`config.edit`

Design notes

Toggle vs configure split. ChaosMonkey on/off is a separate gate from the rate / delay knobs so an operator trusted to stop chaos isn't automatically trusted to crank the dial higher first. Same pattern is applied across HTTP + MCP surfaces.
HardDelete vs Remove. tenant.hard-delete is split from tenant.remove because hard-delete issues PostgreSQL DROP DATABASE … WITH (FORCE) on the tenant's database, while soft remove leaves the database intact for backup / forensic recovery.

Custom authorizer skeleton

A starting point that grants roles via OIDC claim mapping:

public sealed class RoleClaimAuthorizer : ICritterWatchAuthorizer
{
    private readonly IReadOnlyDictionary<string, IReadOnlySet<string>> _capabilityRoles = new Dictionary<string, IReadOnlySet<string>>
    {
        [Capabilities.DlqReplay] = new HashSet<string> { "sre", "platform" },
        [Capabilities.DlqDiscard] = new HashSet<string> { "platform" },
        [Capabilities.ProjectionRebuild] = new HashSet<string> { "platform" },
        [Capabilities.TenantHardDelete] = new HashSet<string> { "platform-admin" },
        // … one row per capability you want to gate
    };

    public Task<bool> IsAllowedAsync(
        ClaimsPrincipal principal,
        string capability,
        string? resource = null,
        CancellationToken ct = default)
    {
        if (!_capabilityRoles.TryGetValue(capability, out var requiredRoles))
        {
            // Anything not explicitly listed is denied — fail-closed default.
            return Task.FromResult(false);
        }

        var has = principal.Claims
            .Where(c => c.Type == ClaimTypes.Role || c.Type == "role" || c.Type == "roles")
            .Any(c => requiredRoles.Contains(c.Value));

        return Task.FromResult(has);
    }
}

Hook it up:

builder.Services.AddCritterWatchAuthorization<RoleClaimAuthorizer>();

For resource-scoped rules (e.g. only allow alert.clear on services matching a per-rotation prefix), read the resource parameter:

csharp

public Task<bool> IsAllowedAsync(
    ClaimsPrincipal principal,
    string capability,
    string? resource,
    CancellationToken ct)
{
    if (capability == Capabilities.AlertClear && resource is not null)
    {
        var serviceName = resource.Split(':').Skip(1).FirstOrDefault();
        if (serviceName is not null && OnCallRotation.OwnsService(principal, serviceName))
            return Task.FromResult(true);
    }

    return _baseline.IsAllowedAsync(principal, capability, resource, ct);
}

Testing your authorizer

RbacFoundationTests and RbacEnforcementHttpTests in the CritterWatch repo are the reference patterns. For your own host:

csharp

[Fact]
public async Task sre_role_can_replay_dlq()
{
    var authorizer = new RoleClaimAuthorizer(/* … */);
    var principal = new ClaimsPrincipal(
        new ClaimsIdentity(
            [new Claim(ClaimTypes.Role, "sre")],
            authenticationType: "Test"));

    var allowed = await authorizer.IsAllowedAsync(
        principal, Capabilities.DlqReplay, resource: "TripService");

    allowed.ShouldBeTrue();
}

For HTTP-level end-to-end coverage, Alba scenarios against a real Wolverine.Http pipeline exercise the full codegen path — see src/Tests/Services/RbacEnforcementHttpTests.cs for the shape.

Cross-application MCP server — RBAC enforcement on the MCP tool surface
Clustering — RBAC works the same in single-node and clustered deployments
Licensing — license gating is a separate, complementary layer (RBAC governs which authenticated principals can do something; license gating governs whether the operation is enabled at all)

Role-based access control ​

How CritterWatch decides ​

Turning enforcement on ​

Fail-closed on missing principal ​

What's gated, and how ​

HTTP endpoints ​

SignalR-routed commands ​

MCP action tools ​

The capability catalog ​

Read surfaces (17) ​

MCP tool family gates (4) ​

Operator actions (30) ​

Design notes ​

Custom authorizer skeleton ​

Testing your authorizer ​

Related ​

Role-based access control

How CritterWatch decides

Turning enforcement on

Fail-closed on missing principal

What's gated, and how

HTTP endpoints

SignalR-routed commands

MCP action tools

The capability catalog

Read surfaces (17)

MCP tool family gates (4)

Operator actions (30)

Design notes

Custom authorizer skeleton

Testing your authorizer

Related