/ knowledge base / entity matching / data quality
Building a Supplier Name Knowledge Base Without Polluting It
Name memory helps AI verification only when the team records confirmed aliases and rejects unsupported guesses.
A supplier name knowledge base can help analysts connect Chinese names, English trade names, websites, and invoice issuers. It can also become dangerous when every model guess enters the database as truth.
Separate confirmed aliases from possible matches. A confirmed alias should have evidence: same registration code, official document, supplier authorization, or analyst-reviewed relationship. A fuzzy match should remain a lead until someone reviews it.
Store source and date for every name. A website footer from last year, a current invoice, and a business license carry different weight. The knowledge base should show where the name came from.
Avoid over-normalizing Chinese company names. Removing too much punctuation, location detail, or legal suffix can collapse distinct entities into one record. The system should preserve original text alongside normalized text.
Review the knowledge base after disputes and corrections. If analysts keep splitting records that the system merged, the matching rules need adjustment.
Working checklist
- Separate confirmed aliases from guesses.
- Store source and capture date.
- Preserve original names.
- Review analyst corrections.
- Block automatic promotion of fuzzy matches.