AI Signals and Reality Checks

AI Signals & Reality Checks: Decision Provenance Turns Into Default UX

Kaizhi Tang

19 Feb 2026 • 7 min read

AI Signals & Reality Checks — Feb 19, 2026

AI Signals & Reality Checks (Feb 19, 2026)

Signal

“Why did it do that?” is becoming a first-class product surface—not a support ticket.

As AI systems graduate from “answering” to acting (booking, routing, editing, approving, sending), teams are quietly standardizing a new UI primitive:

Decision provenance: a compact, structured trace of what the system used and what it decided.

Not a long explanation. Not a motivational story. A readable receipt.

In practice, the best products are converging on a small set of provenance fields that users actually care about:

Inputs: what data sources were consulted (email thread, calendar event, CRM record, doc section).
Constraints: which rules/policies were applied ("do not email external domains", "expense limit $X").
Tool calls: what actions were attempted (drafted email, created event, opened ticket) and with what parameters.
Uncertainty: where the system was unsure (missing attendee, ambiguous account, conflicting dates).
Diffs: what changed (before/after) when it edited something.

This is the UI equivalent of moving from “trust me” to “here’s the trace.”

Reality check

Provenance only builds trust if it’s auditable—otherwise it becomes a decorative “because AI said so” layer.

Most teams underestimate how easy it is to ship provenance that looks good but fails in real operations:

The trace must survive retries and partial failures. If an agent runs the same task twice (network glitch, tool timeout, user changes one parameter), the “receipt” can’t quietly mutate without a record. Users notice when the UI says it used Source A yesterday and Source B today for “the same thing.”
The trace must be tied to real artifacts. If the provenance claims “consulted CRM” but can’t deep-link to the exact record/version, it turns into theater. A provenance line is only useful when it’s grounded: record id, timestamp, query, or snapshot hash.
The trace must include policy decisions, not just model decisions. In agent systems, many failures are not “the model hallucinated”—they’re “the policy layer allowed it” or “the routing layer picked the wrong tool.” If provenance hides those layers, you’ll still be debugging in the dark.
Over-explaining is a trust killer. Users don’t want essays. They want:

the one reason it chose this action,
the one risk it detected,
and the one place they can correct it.

If provenance turns into verbose rationalization, it trains users to stop reading—right before the moment they need it.

Second-order effect

“Receipts” will become a competitive moat—and a compliance requirement.

Once provenance exists, it spreads:

Support teams want it for faster incident resolution.
Security teams want it as an audit trail.
Product teams want it to A/B test autonomy thresholds.
Users want it to decide when to grant broader permissions.

Two practical shifts follow:

Provenance-first design: you architect the agent so every meaningful decision emits a structured event.
- If it didn’t emit, it didn’t happen.
- If it happened, it’s linkable.
UX for correction, not explanation: the trace becomes an interactive object.
- “Use a different thread.”
- “Exclude this contact.”
- “Set a stricter policy next time.”

This is where provenance stops being transparency theater and becomes control.

What to watch (next 24–72h)

Do agent products standardize a “receipt schema” (inputs, tools, policies, diffs) that is portable across vendors?
Are teams instrumenting provenance in the same pipeline as reliability telemetry (so you can correlate failures with specific tool-call patterns)?
Do we see provenance compressed into a single line that users can scan—and expand only when needed?

Source note

W3C PROV (a mature model for describing provenance and traceability): https://www.w3.org/TR/prov-overview/
OpenTelemetry (standardized traces/spans as the backbone for cross-system observability): https://opentelemetry.io/
NIST AI Risk Management Framework (governance + transparency expectations that push toward auditable systems): https://www.nist.gov/itl/ai-risk-management-framework

中文翻译（全文）

AI 信号与现实校验（2026 年 2 月 19 日）

今日信号

“它为什么这么做？”正在变成一等产品表面——而不是一个支持工单。

当 AI 系统从“回答问题”升级为执行动作（预约、路由、编辑、审批、发送）时，很多团队正在悄悄形成一种新的 UI 原语：

决策溯源（decision provenance）： 用简洁、结构化的方式，展示系统用了什么、做了什么决定。

这不是长篇解释。也不是带情绪的“故事”。它更像是一张可读的“回执（receipt）”。

在实践中，做得最好的产品，通常会收敛到一组用户真正关心的溯源字段：

**输入（Inputs）：**查阅了哪些数据源（邮件线程、日历事件、CRM 记录、文档段落）。
**约束（Constraints）：**应用了哪些规则/政策（“不要给外部域名发邮件”“报销上限 $X”）。
**工具调用（Tool calls）：**尝试了哪些动作（起草邮件、创建事件、开工单），以及使用了哪些参数。
**不确定性（Uncertainty）：**系统在哪里不确定（缺少参与人、账户歧义、日期冲突）。
**差异（Diffs）：**在编辑内容时做了哪些改动（前后对比）。

这相当于把产品从“相信我”推进到 “这是痕迹/轨迹（trace）”。

现实校验

*只有当溯源是可审计（auditable）的，它才会建立信任；否则它只是一个装饰性的“因为 AI 这么说”层。*

很多团队低估了一个事实：做出“看起来很好”的溯源很容易，但要让它在真实运营中可靠却很难。

常见坑包括：

1）溯源必须能穿过重试与部分失败。 当 agent 因为网络抖动、工具超时、或用户改了一个参数而重跑任务时，“回执”不能在没有记录的情况下悄悄变样。用户会注意到：同一个任务，昨天写“用了来源 A”，今天又写“用了来源 B”。

2）溯源必须绑定到真实工件（artifact）。 如果溯源写着“查阅了 CRM”，却无法深链到具体记录/版本，它就会变成表演。一条溯源只有在“可落地”时才有价值：记录 id、时间戳、查询条件、或者快照 hash。

3）溯源必须包含“政策决策”，而不仅是“模型决策”。 在 agent 系统里，很多失败并不是“模型胡编乱造”，而是“策略层允许了它”或“路由层选错了工具”。如果溯源把这些层隐藏起来，你依然会在黑箱里排障。

4）过度解释会破坏信任。 用户不想看作文。他们真正想要的是：

选择这个动作的一个关键理由，
检测到的一个关键风险，
以及一个可以修正它的入口。

如果溯源变成啰嗦的“合理化”，用户会被训练成不再阅读——而这恰恰发生在他们最需要阅读的时刻。

二阶推演

“回执”会变成竞争壁垒——也可能变成合规要求。

一旦溯源存在，它会迅速扩散到更多人群：

支持团队想用它来更快定位事故。
安全团队想把它当成审计轨迹。
产品团队想用它来做自治阈值的 A/B。
用户会用它来决定是否授予更大权限。

两个非常现实的转变会随之发生：

**溯源优先的架构（provenance-first design）：**你会把 agent 设计成：每一个关键决策都必须产生结构化事件。
- 没有输出事件，就当它没发生。
- 发生了，就必须可链接、可追溯。
**为“纠正”而不是“解释”的 UX：*溯源要变成可交互对象*。
- “换一个线程做依据。”
- “排除这个联系人。”
- “下次设定更严格的策略。”

这时，溯源就不再是透明度表演，而是控制能力。

未来 24–72 小时观察点

agent 产品是否会标准化一种可移植的“回执 schema”（输入、工具、政策、diff），并能跨厂商使用？
团队是否把溯源放进和可靠性遥测同一条管线里（从而把故障与具体的 tool-call 模式关联起来）？
是否会出现“可扫一眼的一行溯源”，并且只有在需要时才展开更多细节？

参考

W3C PROV（描述溯源与可追踪性的成熟模型）：https://www.w3.org/TR/prov-overview/
OpenTelemetry（跨系统可观测性的标准化 trace/span 基础设施）：https://opentelemetry.io/
NIST AI 风险管理框架（推动可审计系统的治理与透明性预期）：https://www.nist.gov/itl/ai-risk-management-framework