During an incident, "the AI says so" is not good enough.
If a system suggests a rollback, highlights a deployment, or points at a risky change, engineers need to know why. Not later. Not in a separate dashboard nobody has time to open. In the moment, while the incident is still moving.
A recommendation without reasoning creates a second incident response problem: now the team has to decide whether to trust the tool.
Black-box recommendations create risk
When a tool says "this deployment is likely related" or "rollback may be appropriate," the natural question is: based on what?
Was it timing? Changed files? Error patterns in the logs? A match against a previous incident? Each of those would justify a different level of confidence. Each would suggest a different next step. Without knowing which signals drove the recommendation, the team cannot evaluate it, and cannot push back if something feels wrong.
This plays out in recognisable ways:
- teams follow a plausible-but-wrong suggestion because they had no way to inspect it
- teams dismiss a useful recommendation because they could not see what produced it
- confidence gets mistaken for certainty, and the investigation narrows too early
- responders spend time validating the tool instead of investigating the incident
The problem is not just that a recommendation might be wrong. It is that without explanation, no one can tell.
Explainability means showing the evidence
Explainability in incident response is not an academic concern. It is about helping teams answer the questions they are already asking when an incident is active.
When a tool surfaces a candidate cause, engineers should be able to see:
- what signals were considered
- what changed recently, and why it is being highlighted
- what evidence supports or weakens the hypothesis
- how confident the system is, and why
- whether a similar pattern appeared in a previous incident
A tool that surfaces this reasoning can be argued with. Teams may disagree with the suggested cause and still benefit from the evidence that drove it. That is often how incident response actually works: not a clean path from recommendation to action, but a process of building hypotheses, testing them against available signals, and discarding what does not hold.
In incident response, the reasoning is frequently more useful than the conclusion.
Confidence is not certainty
Most incident AI systems express confidence in some form. That confidence needs to be visible and honest.
Low confidence should not be hidden. A system that only surfaces high-confidence recommendations creates a false impression that everything shown has been vetted. When a lower-confidence signal is still relevant, teams should see it.
Moderate confidence is still useful. It can direct attention without closing the investigation.
High confidence still needs evidence. Even the strongest signal deserves inspection when the decision involves a rollback or an escalation.
Confidence is not a substitute for judgement. It is a way to make uncertainty visible.
A system that presents every recommendation with the same certainty is not more helpful. It is harder to use, because the team has no way to calibrate how much weight to give any of it.
Where Ember fits
Ember is being built around a simple principle: incident reasoning should be inspectable.
Not: "The AI says so."
But: "These are the signals. This is how they relate. This is the confidence level. This is what would make the hypothesis stronger or weaker."
When production behaves differently, Ember should help teams see:
- which recent changes are plausibly related, and why
- which signals support the hypothesis and which weaken it
- whether a similar pattern has appeared in previous incidents
- what confidence applies, and what evidence it rests on
- what a reasonable next action might be, and why it is suggested
Ember does not automatically identify root cause. It does not replace engineering judgement. It is trying to reduce the work of rebuilding context under pressure, and give responders a starting point they can inspect and challenge.
Trust comes from showing the work
The useful incident AI will not be the one that sounds most certain.
It will be the one engineers can challenge and still use under pressure.
Tools that explain their reasoning become part of how teams think through incidents. They shift the question from "do I trust this tool?" to "does this evidence hold?"
Tools that do not explain themselves become something to validate on top of the incident itself.
Ember should not ask engineers to trust a black box when systems are failing. It should show its work.