Skip to main content

What is the Incident Agent?

Hours of digging done in minutes. TierZero Incident Agent joins incidents as your right-hand: gathering context, surfacing what’s relevant, and helping you figure out how to stop the bleeding and why it happened.

How It Works

1

Incident Raised

When an incident is raised, TierZero joins and starts gathering context across logs, metrics, traces, code, deploys, and past incidents. Tag @TierZero to delegate new investigation theories or ask for updates.
2

Root Cause Analysis

TierZero synthesizes signals across your stack, including code changes, logs, traces, metrics, deploys, past incidents, and runbooks, then surfaces high-signal clues to the channel.
3

Close the Loop

Auto-generated post-mortem, action items, and Jira tickets. Reduces the painful “recovery to resolution” cycle from days to hours.

Key Capabilities

Real-Time Catch-Up

Keep stakeholders in the loop. When your CTO, customer success, or another engineer joins an incident channel mid-flight, they don’t need to ask “what’s going on?” and no one has to stop debugging to explain.
  • Live dashboard: Full context, timeline, investigation findings, and charts from your observability tools.
  • Ask TierZero directly: Tag it anytime for the latest status or to ask specific questions.
  • Ephemeral Slack message: Private summary sent the moment someone joins the incident.

Post-Mortem Generation

Post-mortems drafted before the retro starts. TierZero generates a first draft from the signals it collected during the incident.
  • True incident timeline: Grounded in telemetry data collected during the incident.
  • Customer and service impact assessment: Scope and severity documented automatically.
  • Report drafted based on your template, or standard 5-whys format.
  • Action items with suggested ownership: Clear next steps assigned to the right people.
  • SLO impact assessment: Which SLOs were breached, error budget consumed.

Autonomous Debugging

TierZero goes beyond finding the root cause to generate fix PRs. It correlates errors with specific code changes, identifies the offending commit, and opens a pull request with the fix.
  • Code-level root cause attribution: Pinpoints the exact commit and code path responsible for the failure.
  • Automated fix PR generation: Opens pull requests with proposed fixes, ready for human review.
  • CI/CD failure diagnosis: Intelligent log parsing to identify build and deployment failures.

Automated Remediation

TierZero executes remediation with one-click approval. Every action logged with full audit trail.
  • Rollback to last healthy deploy: One-click rollback with automatic health validation after deployment.
  • Service restart with health validation: Restart degraded services and verify recovery before marking resolved.
  • Feature flag toggle: Disable problematic features instantly to stop the bleeding.
  • Approval workflows: Human-in-the-loop approval for destructive actions with full audit trail.

Supported Incident Sources

TierZero integrates with your incident management platform to sync context:

Use Cases

Accelerate Incident Resolution

  • Challenge: Engineers spend hours gathering context from multiple tools during incidents.
  • How it works: TierZero auto-joins the incident channel, pulls in telemetry from all connected integrations, and surfaces root cause candidates with evidence.
  • Outcome: 2-minute time to first clue. 40%+ MTTR reduction.

Consistent Post-Mortems

  • Challenge: Post-mortems get deprioritized, delayed, and sometimes never finished after an incident.
  • How it works: TierZero generates a first draft from the signals it collected (timeline, impact, RCA, five-whys, and action items) so the retro starts with substance, not a blank page.
  • Outcome: Every incident gets a post-mortem. Teams focus on discussing fixes, not reconstructing events.

Track Incident Metrics

  • Challenge: MTTA and MTTR are hard to measure consistently without manual data entry.
  • How it works: TierZero tracks configurable milestones (time to acknowledge, time to mitigate, time to resolve) automatically based on incident channel activity and external system events.
  • Outcome: Accurate incident metrics without manual effort.

How to Set Up

  1. Navigate to Settings → Incidents
  2. Configure incident channel name patterns (e.g., inc-*, incident-*)
  3. Select your external incident management system (FireHydrant, Rootly, incident.io, or PagerDuty)
  4. Configure metric definitions and milestones
  5. Enable the Incident Agent

Viewing Incidents

The Incidents page provides:
  • Overview dashboard: MTTA/MTTR charts and trends over time
  • Incident list: All tracked incidents with status, severity, and key timestamps
  • Incident detail view with tabs:
    • Executive Summary: High-level overview of the incident
    • Timeline: Chronological sequence of events
    • Root Cause Analysis: Findings with confidence levels
    • Five-Whys: Structured root cause analysis
    • Follow-Ups: Action items and their status
    • Communications: Auto-generated stakeholder updates
    • Custom Post-Mortem: Report based on your template

Best Practices

1. Use Consistent Incident Channel Naming
  • Configure name patterns that match your team’s convention (e.g., inc-*, sev1-*)
  • Consistent naming ensures TierZero automatically joins every incident
2. Connect an External Incident Management System
  • Richer context from FireHydrant, Rootly, or incident.io improves TierZero’s investigation quality
  • External system data is synced in real-time during incidents
3. Configure Relevant Milestones
  • Define milestones that match your MTTR goals
  • Track what matters to your team’s incident response process
4. Review and Refine Auto-Generated Post-Mortems
  • Use the generated post-mortem as a starting point, not the final word
  • Add team-specific context and discuss action items in your retro