How do I tell whether a reasoning model's scratchpad actually drove its answer?
Opportunity
Frontier models that emit visible chain-of-thought traces often arrive at an answer before or independently of those steps, then generate plausible-looking reasoning as post-hoc rationalization. Existing faithfulness metrics disagree with each other depending on how the classifier is constructed, which means there is no accepted ground truth for what a faithful trace even looks like. No production tooling flags unfaithful reasoning at inference time or attaches any confidence to whether the trace caused the output. Regulated industries and safety reviews that treat visible reasoning as an explanation of model behavior are relying on something that may be a narrative constructed after the fact.
Why it matters
If a reasoning trace is post-hoc rationalization, every audit, accountability claim, or compliance check built on top of it is invalid.
How I score the opportunity
The Opportunity Score is my own read, not a measurement: how much it hurts, how often it bites, and how little exists to solve it today. Higher means I think it is more worth building.
How much pain it causes when it shows up.
How often people actually run into it.
How little good tooling exists for it today.
More problems worth solving
Why does every AI app forget me the moment I close the tab?
AIWhy is learning a new field still gated by knowing what to ask?
AIWhy can a non-expert not verify what an AI just told them?
AIWhy do we test models on benchmarks but ship them on vibes?
AIWhy do AI agents have no memory of their own mistakes?
AIWhy can't I audit what a model was actually trained on?