Why can text generated by an open-source model not be reliably traced back to it?

Opportunity

Closed-model providers can embed statistical watermarks in generated text at inference time, allowing content to be attributed to a specific model after the fact. Open-source models give users full access to the decoding procedure, so any generation-time watermark can be removed by modifying a few lines of sampling code. Post-hoc watermarking of already-generated text breaks under paraphrase attacks. Embedding markers in model weights survives some attacks but not fine-tuning, which anyone running local weights can apply in an afternoon. As of late 2025, no scheme provides practical, removal-resistant provenance marking for output from open-weights models, and the research community acknowledges the problem remains open.

Why it matters

Without watermarking for open models, AI-generated text provenance is only traceable when the generator chooses to cooperate.

How I score the opportunity

The Opportunity Score is my own read, not a measurement: how much it hurts, how often it bites, and how little exists to solve it today. Higher means I think it is more worth building.

Severity8/10

How much pain it causes when it shows up.

Frequency8/10

How often people actually run into it.

Whitespace9/10

How little good tooling exists for it today.

Why can text generated by an open-source model not be reliably traced back to it?

How I score the opportunity

More problems worth solving