/ document provenance / model confidence / AI risk
Why Document Provenance Matters More Than Model Confidence
A confident AI answer is weak when the source document is stale, cropped, altered, or unrelated.
Why it matters
Model confidence can be useful for OCR, classification, and extraction, but it does not answer the most important verification question: where did this evidence come from? A model may read a document correctly while the document itself is old, cropped, copied, or irrelevant to the transaction.
Evidence to collect
Track the source channel, upload time, original filename, document type, visible issuer, holder name, date, and whether the file was supplied by the seller or collected independently. Store the original file beside the model output.
How to review it
Review confidence only after provenance is clear. A high-confidence extraction from a low-quality source should not clear a case. A lower-confidence extraction from a reliable original document may be safer if the field can be manually confirmed.
Where buyers get misled
Teams get misled when confidence scores look scientific. The score may reflect how easily the model read the text, not whether the evidence proves the supplier claim.
Practical next step
Add provenance fields to every AI case file. The analyst should be able to see source, capture date, document relationship, and unresolved provenance questions before accepting a model summary.
Working checklist
- Store original files.
- Record source channel.
- Separate extraction confidence from evidence strength.
- Flag cropped documents.
- Require provenance review for high-risk decisions.